Segmented Symbolic Analysis

Wei Le

Rochester Institute of Technology, USA

Track: Technical Research
Session: Formal Analysis
Symbolic analysis is indispensable for software tools that require program semantic information at compile time. However, determining symbolic values for program variables related to loops and library calls is challenging, as the computation and data related to loops can have statically unknown bounds, and the library sources are typically not available at compile time. In this paper, we propose segmented symbolic analysis, a hybrid technique that enables fully automatic symbolic analysis even for the traditionally challenging code of library calls and loops. The novelties of this work are threefold: 1) we flexibly weave symbolic and concrete executions on the selected parts of the program based on demand; 2) dynamic executions are performed on the unit tests constructed from the code segments to infer program semantics needed by static analysis; and 3) the dynamic information from multiple runs is aggregated via regression analysis. We developed the Helium framework, consisting of a static component that performs symbolic analysis and partitions a program, a dynamic analysis that synthesizes unit tests and automatically infers symbolic values for program variables, and a protocol that enables static and dynamic analyses to be run interactively and concurrently. Our experimental results show that by handling loops and library calls that a traditional symbolic analysis cannot process, segmented symbolic analysis detects 5 times more buffer overflows. The technique is scalable for real-world programs such as putty, tightvnc and snort.