TSTSS: A two-stage training subset selection framework for cross version defect prediction

Zhou Xu, Shuai Li, Xiapu Luo, Jin Liu, Tao Zhang, Yutian Tang, Jun Xu, Peipei Yuan, Jacky Keung

Research output: Journal article publicationJournal articleAcademic researchpeer-review

13 Citations (Scopus)


Cross Version Defect Prediction (CVDP) is a practical scenario by training the classification model on the historical data of the prior version and then predicting the defect labels of modules in the current version. Unfortunately, the differences of data distribution across versions may hinder the effectiveness of the trained CVDP model. Thus, it is not trivial to select a suitable training subset from the prior version to promote the CVDP performance. In this paper, we propose a novel method, called Two-Stage Training Subset Selection (TSTSS), to address this challenging issue. In the first stage, TSTSS utilizes a sparse modeling representative selection method to select an initial module subset from the prior version which can well reconstruct the data of the prior version. In the second stage, TSTSS leverages a dissimilarity-based sparse subset selection method to further refine the selected module subset, which enables the selected modules to well represent the modules of the current version. Finally, we use a novel weighted extreme learning machine classifier to construct the CVDP model. We evaluate the CVDP performance of TSTSS on 50 cross-version pairs using 6 indicators. The experiments show that TSTSS can efficiently improve the CVDP performance compared with 11 baseline methods.

Original languageEnglish
Pages (from-to)59-78
Number of pages20
JournalJournal of Systems and Software
Publication statusPublished - Aug 2019


  • Cross version defect prediction
  • Spare modeling
  • Training subset selection
  • Weighted extreme learning machine

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this