Feature selection and embedding based cross project framework for identifying crashing fault residence

Zhou Xu, Tao Zhang, Jacky Keung, Meng Yan, Xiapu Luo, Xiaohong Zhang, Ling Xu, Yutian Tang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

9 Citations (Scopus)

Abstract

Context: The automatically produced crash reports are able to analyze the root of fault causing the crash (crashing fault for short) which is a critical activity for software quality assurance. Objective: Correctly predicting the existence of crashing fault residence in stack traces of crash report can speed up program debugging process and optimize debugging efforts. Existing work focused on the collected label information from bug-fixing logs, and the extracted features of crash instances from stack traces and source code for Identification of Crashing Fault Residence (ICFR) of newly-submitted crashes. This work develops a novel cross project ICFR framework to address the data scarcity problem by using labeled crash data of other project for the ICFR task of the project at hand. This framework removes irrelevant features, reduces distribution differences, and eases the class imbalance issue of cross project data since these factors may negatively impact the ICFR performance. Method: The proposed framework, called FSE, combines Feature Selection and feature Embedding techniques. The FSE framework first uses an information gain ratio based feature ranking method to select a relevant feature subset for cross project data, and then employs a state-of-the-art Weighted Balanced Distribution Adaptation (WBDA) method to map features of cross project data into a common space. WBDA considers both marginal and conditional distributions as well as their weights to reduce data distribution discrepancies. Besides, WBDA balances the class proportion of each project data to alleviate the class imbalance issue. Results: We conduct experiments on 7 projects to evaluate the performance of our FSE framework. The results show that FSE outperforms 25 methods under comparison. Conclusion: This work proposes a cross project learning framework for ICFR, which uses feature selection and embedding to remove irrelevant features and reduce distribution differences, respectively. The results illustrate the performance superiority of our FSE framework.

Original languageEnglish
Article number106452
Pages (from-to)1-20
JournalInformation and Software Technology
Volume131
DOIs
Publication statusPublished - Mar 2021

Keywords

  • Crashing fault
  • Cross project framework
  • Feature embedding
  • Feature selection
  • Stack trace

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Feature selection and embedding based cross project framework for identifying crashing fault residence'. Together they form a unique fingerprint.

Cite this