Cross-version defect prediction via hybrid active learning with kernel principal component analysis

Zhou Xu, Jin Liu, Xiapu Luo, Tao Zhang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

31 Citations (Scopus)

Abstract

As defects in software modules may cause product failure and financial loss, it is critical to utilize defect prediction methods to effectively identify the potentially defective modules for a thorough inspection, especially in the early stage of software development lifecycle. For an upcoming version of a software project, it is practical to employ the historical labeled defect data of the prior versions within the same project to conduct defect prediction on the current version, i.e., Cross-Version Defect Prediction (CVDP). However, software development is a dynamic evolution process that may cause the data distribution (such as defect characteristics) to vary across versions. Furthermore, the raw features usually may not well reveal the intrinsic structure information behind the data. Therefore, it is challenging to perform effective CVDP. In this paper, we propose a two-phase CVDP framework that combines Hybrid Active Learning and Kernel PCA (HALKP) to address these two issues. In the first stage, HALKP uses a hybrid active learning method to select some informative and representative unlabeled modules from the current version for querying their labels, then merges them into the labeled modules of the prior version to form an enhanced training set. In the second stage, HALKP employs a non-linear mapping method, kernel PCA, to extract representative features by embedding the original data of two versions into a high-dimension space. We evaluate the HALKP framework on 31 versions of 10 projects with three prevalent performance indicators. The experimental results indicate that HALKP achieves encouraging results with average F-measure, g-mean and Balance of 0.480, 0.592 and 0.580, respectively and significantly outperforms nearly all baseline methods.

Original languageEnglish
Title of host publication25th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages209-220
Number of pages12
ISBN (Electronic)9781538649695
DOIs
Publication statusPublished - 2 Apr 2018
Event25th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2018 - Campobasso, Italy
Duration: 20 Mar 201823 Mar 2018

Publication series

Name25th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2018 - Proceedings
Volume2018-March

Conference

Conference25th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2018
CountryItaly
CityCampobasso
Period20/03/1823/03/18

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Cite this