Open access
Author
Date
2024Type
- Doctoral Thesis
ETH Bibliography
yes
Altmetrics
Abstract
Software takes charge of every critical aspect of our modern society, including communication, finance, transportation, and many more. It is thus crucial to ensure the reliability of software systems. Yet, guaranteeing that non-trivial software systems are free of defects is extremely difficult, if not impossible. Consequently, modern software systems are full of bugs, such as security vulnerabilities, semantic bugs, performance issues, etc.
The motivating question of this thesis is: where can software go wrong? Software development is an intricate process with many different procedures in the pipeline. Beyond the source code written by developers, there are many other tools involved, such as code analysis tools used for identifying defects and compilers used for translating source code into machine code. Unfortunately, they can all go wrong. In this thesis, we study the reliability problem from three different levels: code, code analysis, and code compilation. At a high level, we design new methodologies to identify and detect bugs at all of these levels.
For the reliability of code, we focus on eliminating undefined behavior, a major source of reliability bugs such as buffer-overflow and use-after-free, in modern C/C++ software. We develop a general detection approach to identify undefined behaviors practically and effectively. To improve detection efficiency, we further present two novel concepts to accelerate the existing detection frameworks. For the reliability of code analysis, we aim to validate existing bug detection tools for undefined behaviors. We propose and design the first program generator that can automatically produce a large number of programs with various undefined behaviors. We then use this generator to validate sanitizers, one of the most popular toolsets for undefined behavior detection. For the reliability of code compilation, we concentrate on solidifying the modern compiler implementations. We introduce a novel data-driven program generation technique that can generate expressive and well-formed programs based on real-world code snippets.
At the conceptual level, this thesis highlights the prevalence of reliability problems in the software development pipeline, from code to compilation. At the technical level, this thesis presents five new tools for detecting software defects in source code, code analysis tools, and compilers. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000676103Publication status
publishedExternal links
Search print copy at ETH Library
Publisher
ETH ZurichSubject
Programming Languages; Computer security; Software engineering; CompilersOrganisational unit
02150 - Dep. Informatik / Dep. of Computer Science09628 - Su, Zhendong / Su, Zhendong
More
Show all metadata
ETH Bibliography
yes
Altmetrics