Cross version defect prediction with representative data via sparse subset selection

Zhou Xu, Shuai Li, Yutian Tang, Xiapu Luo, Tao Zhang, Jin Liu, Jun Xu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

9 Citations (Scopus)

Abstract

Software defect prediction aims at detecting the defect-prone software modules by mining historical development data from software repositories. If such modules are identified at the early stage of the development, it can save large amounts of resources. Cross Version Defect Prediction (CVDP) is a practical scenario by training the classification model on the historical data of the prior version and then predicting the defect labels of modules of the current version. However, software development is a constantly-evolving process which leads to the data distribution differences across versions within the same project. The distribution differences will degrade the performance of the classification model. In this paper, we approach this issue by leveraging a state-of-the-art Dissimilarity-based Sparse Subset Selection (DS3) method. This method selects a representative module subset from the prior version based on the pairwise dissimilarities between the modules of two versions and assigns each module of the current version to one of the representative modules. These selected modules can well represent the modules of the current version, thus mitigating the distribution differences. We evaluate the effectiveness of DS3 for CVDP performance on total 40 cross-version pairs from 56 versions of 15 projects with three traditional and two effort-aware indicators. The extensive experiments show that DS3 outperforms three baseline methods, especially in terms of two effort-aware indicators.

Original languageEnglish
Title of host publicationProceedings - 2018 ACM/IEEE 26th International Conference on Program Comprehension, ICPC 2018
PublisherIEEE Computer Society
Pages132-143
Number of pages12
ISBN (Print)9781450357142
DOIs
Publication statusPublished - 28 May 2018
EventACM/IEEE 26th International Conference on Program Comprehension, ICPC 2018, collocated with the 40th International Conference on Software Engineering, ICSE 2018 - Gothenburg, Sweden
Duration: 27 May 201828 May 2018

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Conference

ConferenceACM/IEEE 26th International Conference on Program Comprehension, ICPC 2018, collocated with the 40th International Conference on Software Engineering, ICSE 2018
CountrySweden
CityGothenburg
Period27/05/1828/05/18

Keywords

  • cross version defect prediction
  • pairwise dissimilarities
  • representative data
  • sparse subset selection

ASJC Scopus subject areas

  • Software

Cite this