TY - JOUR
T1 - Feature selection and embedding based cross project framework for identifying crashing fault residence
AU - Xu, Zhou
AU - Zhang, Tao
AU - Keung, Jacky
AU - Yan, Meng
AU - Luo, Xiapu
AU - Zhang, Xiaohong
AU - Xu, Ling
AU - Tang, Yutian
N1 - Funding Information:
This work is supported by the National Key Research and Development Project (No. 2018YFB2101200), the National Natural Science Foundation of China (No. 62002034), China Postdoctoral Science Foundation (No. 2020M673137, No. 2017M621247), the Natural Science Foundation of Chongqing in China (No. cstc2020jcyj-bshX0114), the Science and Technology Development Fund of Macau (No. 0047/2020/A1), Faculty Research Grant Projects of MUST (No. FRG-20-008-FI), Hong Kong Research Grant Council Project (No. 152239/18E), the General Research Fund of the Research Grant Council of Hong Kong (No. 11208017), the Fundamental Research Funds for the Central Universities (No. 2020CDJQY-A021, No. 2019CDYGYB014).
Funding Information:
This work is supported by the National Key Research and Development Project (No. 2018YFB2101200 ), the National Natural Science Foundation of China (No. 62002034 ), China Postdoctoral Science Foundation (No. 2020M673137 , No. 2017M621247 ), the Natural Science Foundation of Chongqing in China (No. cstc2020jcyj-bshX0114 ), the Science and Technology Development Fund of Macau (No. 0047/2020/A1 ), Faculty Research Grant Projects of MUST (No. FRG-20-008-FI ), Hong Kong Research Grant Council Project (No. 152239/18E ), the General Research Fund of the Research Grant Council of Hong Kong (No. 11208017), the Fundamental Research Funds for the Central Universities (No. 2020CDJQY-A021 , No. 2019CDYGYB014 ).
Publisher Copyright:
© 2020
PY - 2021/3
Y1 - 2021/3
N2 - Context: The automatically produced crash reports are able to analyze the root of fault causing the crash (crashing fault for short) which is a critical activity for software quality assurance. Objective: Correctly predicting the existence of crashing fault residence in stack traces of crash report can speed up program debugging process and optimize debugging efforts. Existing work focused on the collected label information from bug-fixing logs, and the extracted features of crash instances from stack traces and source code for Identification of Crashing Fault Residence (ICFR) of newly-submitted crashes. This work develops a novel cross project ICFR framework to address the data scarcity problem by using labeled crash data of other project for the ICFR task of the project at hand. This framework removes irrelevant features, reduces distribution differences, and eases the class imbalance issue of cross project data since these factors may negatively impact the ICFR performance. Method: The proposed framework, called FSE, combines Feature Selection and feature Embedding techniques. The FSE framework first uses an information gain ratio based feature ranking method to select a relevant feature subset for cross project data, and then employs a state-of-the-art Weighted Balanced Distribution Adaptation (WBDA) method to map features of cross project data into a common space. WBDA considers both marginal and conditional distributions as well as their weights to reduce data distribution discrepancies. Besides, WBDA balances the class proportion of each project data to alleviate the class imbalance issue. Results: We conduct experiments on 7 projects to evaluate the performance of our FSE framework. The results show that FSE outperforms 25 methods under comparison. Conclusion: This work proposes a cross project learning framework for ICFR, which uses feature selection and embedding to remove irrelevant features and reduce distribution differences, respectively. The results illustrate the performance superiority of our FSE framework.
AB - Context: The automatically produced crash reports are able to analyze the root of fault causing the crash (crashing fault for short) which is a critical activity for software quality assurance. Objective: Correctly predicting the existence of crashing fault residence in stack traces of crash report can speed up program debugging process and optimize debugging efforts. Existing work focused on the collected label information from bug-fixing logs, and the extracted features of crash instances from stack traces and source code for Identification of Crashing Fault Residence (ICFR) of newly-submitted crashes. This work develops a novel cross project ICFR framework to address the data scarcity problem by using labeled crash data of other project for the ICFR task of the project at hand. This framework removes irrelevant features, reduces distribution differences, and eases the class imbalance issue of cross project data since these factors may negatively impact the ICFR performance. Method: The proposed framework, called FSE, combines Feature Selection and feature Embedding techniques. The FSE framework first uses an information gain ratio based feature ranking method to select a relevant feature subset for cross project data, and then employs a state-of-the-art Weighted Balanced Distribution Adaptation (WBDA) method to map features of cross project data into a common space. WBDA considers both marginal and conditional distributions as well as their weights to reduce data distribution discrepancies. Besides, WBDA balances the class proportion of each project data to alleviate the class imbalance issue. Results: We conduct experiments on 7 projects to evaluate the performance of our FSE framework. The results show that FSE outperforms 25 methods under comparison. Conclusion: This work proposes a cross project learning framework for ICFR, which uses feature selection and embedding to remove irrelevant features and reduce distribution differences, respectively. The results illustrate the performance superiority of our FSE framework.
KW - Crashing fault
KW - Cross project framework
KW - Feature embedding
KW - Feature selection
KW - Stack trace
UR - http://www.scopus.com/inward/record.url?scp=85096664521&partnerID=8YFLogxK
U2 - 10.1016/j.infsof.2020.106452
DO - 10.1016/j.infsof.2020.106452
M3 - Journal article
AN - SCOPUS:85096664521
SN - 0950-5849
VL - 131
SP - 1
EP - 20
JO - Information and Software Technology
JF - Information and Software Technology
M1 - 106452
ER -