A comprehensive comparative study of clustering-based unsupervised defect prediction models

Zhou Xu, Li Li, Meng Yan, Jin Liu, Xiapu Luo, John Grundy, Yifeng Zhang, Xiaohong Zhang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

Software defect prediction recommends the most defect-prone software modules for optimization of the test resource allocation. The limitation of the extensively-studied supervised defect prediction methods is that they require labeled software modules which are not always available. An alternative solution is to apply clustering-based unsupervised models to the unlabeled defect data, called Clustering-based Unsupervised Defect Prediction (CUDP). However, there are few studies to explore the impacts of clustering-based models on defect prediction performance. In this work, we performed a large-scale empirical study on 40 unsupervised models to fill this gap. We chose an open-source dataset including 27 project versions with 3 types of features. The experimental results show that (1) different clustering-based models have significant performance differences and the performance of models in the instance-violation-score-based clustering family is obviously superior to that of models in hierarchy-based, density-based, grid-based, sequence-based, and hybrid-based clustering families; (2) the models in the instance-violation-score-based clustering family achieves competitive performance compared with typical supervised models; (3) the impacts of feature types on the performance of the models are related to the indicators used; and (4)the clustering-based unsupervised models do not always achieve better performance on defect data with the combination of the 3 types of features.

Original languageEnglish
Article number110862
Pages (from-to)1-22
JournalJournal of Systems and Software
Volume172
DOIs
Publication statusPublished - Feb 2021

Keywords

  • Clustering-based unsupervised models
  • Data analytics for defect prediction
  • Empirical study

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this