Automatically Detecting and Mitigating Issues in Program Analyzers

  • In recent years, the formal methods community has made significant progress towards the development of industrial-strength static analysis tools that can check properties of real-world production code. Such tools can help developers detect potential bugs and security vulnerabilities in critical software before deployment. While the potential benefits of static analysis tools are clear, their usability and effectiveness in mainstream software development workflows often comes into question and can prevent software developers from using these tools to their full potential. In this dissertation, we focus on two major challenges that can limit their ability to be incorporated into software development workflows. The first challenge is unintentional unsoundness. Static program analyzers are complicated tools, implementing sophisticated algorithms and performance heuristics. This makes them highly susceptible to undetected unintentional soundness issues. These issues in program analyzers can cause false negatives and have disastrous consequences e.g., when analyzing safety critical software. In this dissertation, we present novel techniques to detect unintentional unsoundness bugs in two foundational program analysis tools namely SMT solvers and Datalog engines. These tools are used extensively by the formal methods community, for instance, in software verification, systematic testing, and program synthesis. We implemented these techniques as easy-to-use open source tools that are publicly available on Github. With the proposed techniques, we were able to detect more than 55 unique and confirmed critical soundness bugs in popular and widely used SMT solvers and Datalog engines in only a few months of testing. The second challenge is finding the right balance between soundness, precision, and perfor- mance. In an ideal world, a static analyzer should be as precise as possible while maintaining soundness and being sufficiently fast. However, to overcome undecidability issues, these tools have to employ a variety of techniques to be practical for example, compromising on the sound- ness of the analysis or approximating code behavior. Static analyzers therefore are not trivial to integrate into any usage scenario with different program sizes, resource constraints and SLAs. Most of the times, these tools also don’t scale to large industrial code bases containing millions of lines of code. This makes it extremely challenging to get the most out of these analyzers and integrate them into everyday development activities, especially for average software develop- ment teams with little to no knowledge or understanding of advanced static analysis techniques. In this dissertation we present an approach to automatically tailor an abstract interpreter to the code under analysis and any given resource constraints. We implemented our technique as an open source framework, which is publicly available on Github. The second contribution of this dissertation in this challenge area is a technique to horizontally scale analysis tools in cloud-based static analysis platforms by splitting the input to the analyzer into partitions and analyzing the partitions independently. The technique was developed in collaboration with Amazon Web Services and is now being used in production in their CodeGuru service.

Download full text files

Export metadata

Metadaten
Author:Muhammad Numair Mansur
URN:urn:nbn:de:hbz:386-kluedo-72353
DOI:https://doi.org/10.26204/KLUEDO/7235
Advisor:Maria Christakis
Document Type:Doctoral Thesis
Language of publication:English
Date of Publication (online):2023/04/12
Year of first Publication:2023
Publishing Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Granting Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Acceptance Date of the Thesis:2023/03/31
Date of the Publication (Server):2023/04/14
Page Number:X,159
Faculties / Organisational entities:Kaiserslautern - Fachbereich Informatik
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung (CC BY 4.0)