Yaniv David, Ph.D. Thesis Seminar
We address the problem of binary code search in stripped executables (with no debug information). The main challenge is establishing binary code similarity even when the binary code has been compiled using different compilers, optimization levels and targeting diverse architectures. Moreover, the source code being compiled might be from another version of the software package or another implementation altogether. Overcoming these challenges, while avoiding false-positives, is invaluable to guiding other more costly tasks in the field of binary code analysis. These tasks include automated reverse engineering and vulnerability detection.
We present an iterative process of explaining and addressing the different parts of the binary similarity problem. At each step, we further refine our similarity method: improving our representations for the binary code while incorporating techniques from other fields to create a measure for binary similarity between procedures. These fields include model theory, statistical frameworks, SMT solvers and deep neural networks.
We tested our developed methods in real-world scenarios by applying them to find vulnerabilities using search and perform name prediction on binary procedures. We discovered 373 vulnerabilities affecting publicly available firmware, 147 of them in the latest available firmware version for the device, and successfully predicted procedure names improving on the state-of-the-art by 20% and improving by 84% over state-of-the-art neural models that do not use any static analysis.