TY - GEN
T1 - Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: An empirical study
AU - Li, Ke
AU - Xiang, Zilin
AU - Chen, Tao
AU - Wang, Shuo
AU - Tan, Kay Chen
N1 - Funding Information:
K. Li was supported by UKRI Future Leaders Fellowship (Grant No. MR/S017062/1) and Royal Society (Grant No. IEC/NSFC/170243).
Publisher Copyright:
© 2020 Association for Computing Machinery.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/6/27
Y1 - 2020/6/27
N2 - Data-driven defect prediction has become increasingly important in software engineering process. Since it is not uncommon that data from a software project is insufficient for training a reliable defect prediction model, transfer learning that borrows data/konwledge from other projects to facilitate the model building at the current project, namely cross-project defect prediction (CPDP), is naturally plausible. Most CPDP techniques involve two major steps, i.e., transfer learning and classification, each of which has at least one parameter to be tuned to achieve their optimal performance. This practice fits well with the purpose of automated parameter optimization. However, there is a lack of thorough understanding about what are the impacts of automated parameter optimization on various CPDP techniques. In this paper, we present the first empirical study that looks into such impacts on 62 CPDP techniques, 13 of which are chosen from the existing CPDP literature while the other 49 ones have not been explored before. We build defect prediction models over 20 real-world software projects that are of different scales and characteristics. Our findings demonstrate that: (1) Automated parameter optimization substantially improves the defect prediction performance of 77% CPDP techniques with a manageable computational cost. Thus more efforts on this aspect are required in future CPDP studies. (2) Transfer learning is of ultimate importance in CPDP. Given a tight computational budget, it is more cost-effective to focus on optimizing the parameter configuration of transfer learning algorithms (3) The research on CPDP is far from mature where it is not difficult' to find a better alternative by making a combination of existing transfer learning and classification techniques. This finding provides important insights about the future design of CPDP techniques.
AB - Data-driven defect prediction has become increasingly important in software engineering process. Since it is not uncommon that data from a software project is insufficient for training a reliable defect prediction model, transfer learning that borrows data/konwledge from other projects to facilitate the model building at the current project, namely cross-project defect prediction (CPDP), is naturally plausible. Most CPDP techniques involve two major steps, i.e., transfer learning and classification, each of which has at least one parameter to be tuned to achieve their optimal performance. This practice fits well with the purpose of automated parameter optimization. However, there is a lack of thorough understanding about what are the impacts of automated parameter optimization on various CPDP techniques. In this paper, we present the first empirical study that looks into such impacts on 62 CPDP techniques, 13 of which are chosen from the existing CPDP literature while the other 49 ones have not been explored before. We build defect prediction models over 20 real-world software projects that are of different scales and characteristics. Our findings demonstrate that: (1) Automated parameter optimization substantially improves the defect prediction performance of 77% CPDP techniques with a manageable computational cost. Thus more efforts on this aspect are required in future CPDP studies. (2) Transfer learning is of ultimate importance in CPDP. Given a tight computational budget, it is more cost-effective to focus on optimizing the parameter configuration of transfer learning algorithms (3) The research on CPDP is far from mature where it is not difficult' to find a better alternative by making a combination of existing transfer learning and classification techniques. This finding provides important insights about the future design of CPDP techniques.
KW - Automated parameter optimization
KW - Classification techniques
KW - Cross-project defect prediction
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85087529683&partnerID=8YFLogxK
U2 - 10.1145/3377811.3380360
DO - 10.1145/3377811.3380360
M3 - Conference article published in proceeding or book
AN - SCOPUS:85087529683
T3 - Proceedings - International Conference on Software Engineering
SP - 566
EP - 577
BT - Proceedings - 2020 ACM/IEEE 42nd International Conference on Software Engineering, ICSE 2020
PB - IEEE Computer Society
T2 - 42nd ACM/IEEE International Conference on Software Engineering, ICSE 2020
Y2 - 27 June 2020 through 19 July 2020
ER -