Abstract
Failure diagnosis in practical systems is difficult, and the main obstacle is that the information a developer has access to is limited. This information is usually not enough to help developers fix or even locate the related bug. Moreover, due to the vast difference between the development and production environments, it is not trivial to reproduce failures from the production environment in the development environment. When failures are caused by non-deterministic events such as race conditions or unforeseen inputs, reproducing them is even more challenging.
In this paper, we present Investigator, a failure diagnosis framework for practical systems running on Arm. At runtime, Investigator leverages the hardware tracing component called Embedded Trace Macrocell (ETM) and a lightweight event capturer to collect information with low overhead. With the collected trace and analysis, Investigator identifies the control and data flow related to the cause of a failure, which helps developers in bug fixing. We implemented a prototype of Investigator and evaluated it with real-world bugs. The results show that Investigator diagnoses these bugs effectively and efficiently while introducing a low performance overhead at runtime.
In this paper, we present Investigator, a failure diagnosis framework for practical systems running on Arm. At runtime, Investigator leverages the hardware tracing component called Embedded Trace Macrocell (ETM) and a lightweight event capturer to collect information with low overhead. With the collected trace and analysis, Investigator identifies the control and data flow related to the cause of a failure, which helps developers in bug fixing. We implemented a prototype of Investigator and evaluated it with real-world bugs. The results show that Investigator diagnoses these bugs effectively and efficiently while introducing a low performance overhead at runtime.
Original language | English |
---|---|
Title of host publication | Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis |
Pages | 917–928 |
Publication status | Published - Jul 2023 |