Cross-project software defect prediction based on multi-source data sets
Cross-project software defect prediction based on multi-source data sets作者机构:School of Computer Science(National Pilot Software Engineering School)Beijing University of Posts and TelecommunicationsBeijing 100876China
出 版 物:《The Journal of China Universities of Posts and Telecommunications》 (中国邮电高校学报(英文版))
年 卷 期:2021年第28卷第4期
页 面:75-87页
核心收录:
学科分类:08[工学] 0835[工学-软件工程] 081202[工学-计算机软件与理论] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:cross-project defect prediction multi-source transfer adaptive boosting ensemble learning
摘 要:Cross-project defect prediction(CPDP) uses one or more source projects to build a defect prediction model and applies the model to the target project. There is usually a big difference between the data distribution of the source project and the target project, which makes it difficult to construct an effective defect prediction model. In order to alleviate the problem of negative migration between the source project and the target project in CPDP, this paper proposes an integrated transfer adaptive boosting(TrAdaBoost) algorithm based on multi-source data sets(MSITrA). The algorithm uses an existing two-stage data filtering algorithm to obtain source project data related to the target project from multiple source items, and then uses the integrated TrAdaBoost algorithm proposed in the paper to build a CPDP model. The experimental results of Promise’s 15 public data sets show that: 1) The cross-project software defect prediction model proposed in this paper has better performance in all tested CPDP methods;2) In the within-project software defect prediction(WPDP) experiment, the proposed CPDP method has achieved the better experimental results than the tested WPDP method.