Effort-aware cross-project just-in-time defect prediction framework for mobile apps

Tian CHENG; Kunsong ZHAO; Song SUN; Muhammad MATEEN; Junhao WEN

doi:10.1007/s11704-021-1013-5

PDF(2097 KB)

Front. Comput. Sci. ›› 2022, Vol. 16 ›› Issue (6) : 166207. DOI: 10.1007/s11704-021-1013-5

Software

RESEARCH ARTICLE

Effort-aware cross-project just-in-time defect prediction framework for mobile apps

Author information +

History +

Abstract

As the boom of mobile devices, Android mobile apps play an irreplaceable roles in people’s daily life, which have the characteristics of frequent updates involving in many code commits to meet new requirements. Just-in-Time (JIT) defect prediction aims to identify whether the commit instances will bring defects into the new release of apps and provides immediate feedback to developers, which is more suitable to mobile apps. As the within-app defect prediction needs sufficient historical data to label the commit instances, which is inadequate in practice, one alternative method is to use the cross-project model. In this work, we propose a novel method, called KAL, for cross-project JIT defect prediction task in the context of Android mobile apps. More specifically, KAL first transforms the commit instances into a high-dimensional feature space using kernel-based principal component analysis technique to obtain the representative features. Then, the adversarial learning technique is used to extract the common feature embedding for the model building. We conduct experiments on 14 Android mobile apps and employ four effort-aware indicators for performance evaluation. The results on 182 cross-project pairs demonstrate that our proposed KAL method obtains better performance than 20 comparative methods.

Graphical abstract

Keywords

kernel-based principal component analysis / adversarial learning / just-in-time defect prediction / cross-project model

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Tian CHENG, Kunsong ZHAO, Song SUN, Muhammad MATEEN, Junhao WEN. Effort-aware cross-project just-in-time defect prediction framework for mobile apps. Front. Comput. Sci., 2022, 16(6): 166207 https://doi.org/10.1007/s11704-021-1013-5

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Ghotra B, McIntosh S, Hassan A E. Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th IEEE International Conference on Software Engineering. 2015, 789– 800

[2]	Xu Z , Li S , Xu J , Luo X , Zhang T , Keung J , Tang Y . LDFR: learning deep feature representation for software defect prediction. Journal of Systems and Software, 2019, 158 : 110402–

[3]	Xu Z, Xuan J, Liu J, Cui X. MICHAC: defect prediction via feature selection based on maximal information coefficient with hierarchical agglomerative clustering. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2016, 370– 381

[4]	Chen X , Mu Y , Qu Y , Ni C , Liu M , He T , Liu S . Do different crossproject defect prediction methods identify the same defective modules?. Journal of Software: Evolution and Process, 2020, 32( 5): e2234–

[5]	Menzies T , Greenwald J , Frank A . Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 2006, 33( 1): 2– 13

[6]	Kamei Y , Shihab E , Adams B , Hassan E , A A , Mockus A , Sinha N . A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering, 2012, 39( 6): 757– 773

[7]	Kamei Y , Fukushima T , McIntosh S , Yamashita K , Ubayashi N , Hassan A E . Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering, 2016, 21( 5): 2072– 2106

[8]	Catolino G, Di Nucci D, Ferrucci F. Cross-project just-in-time bug prediction for mobile apps: an empirical assessment. In: Proceedings of the 6th IEEE/ACM International Conference on Mobile Software Engineering and Systems. 2019, 99– 110

[9]	Jing X Y, Ying S, Zhang Z W, Wu S S, Liu J. Dictionary learning based software defect prediction. In: Proceedings of the 36th International Conference on Software Engineering. 2014, 414– 423

[10]	Xia X , Lo D , Pan S J , Nagappan N , Wang X . Hydra: massively compositional model for cross-project defect prediction. IEEE Transactions on Software Engineering, 2016, 42( 10): 977– 998

[11]	Arisholm E, Briand L C, Fuglerud M. Data mining techniques for building fault-proneness models in telecom java software. In: Proceedings of the 18th IEEE International Symposium on Software Reliability. 2007, 215– 224

[12]	Ma Y , Luo G , Zeng X , Chen A . Transfer learning for cross-company software defect prediction. Information and Software Technology, 2012, 54( 3): 248– 256

[13]	Nam J, Pan S J, Kim S. Transfer defect learning. In: Proceedings of the 35th International Conference on Software Engineering. 2013, 382−391

[14]	Chen L , Fang B , Shang Z , Tang Y . Negative samples reduction in crosscompany software defects prediction. Information and Software Technology, 2015, 62 : 67– 77

[15]	Ryu D , Jang J I , Baik J . A transfer cost-sensitive boosting approach for cross-project defect prediction. Software Quality Journal, 2017, 25( 1): 235– 272

[16]	Liu C , Yang D , Xia X , Yan M , Zhang X . A two-phase transfer learning model for cross-project defect prediction. Information and Software Technology, 2019, 107 : 125– 136

[17]	Xu Z , Pang S , Zhang T , Luo X P , Liu J , Tang Y T , Xue L . Cross project defect prediction via balanced distribution adaptation based transfer learning. Journal of Computer Science and Technology, 2019, 34( 5): 1039– 1062

[18]	McIntosh S , Kamei Y . Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction.. IEEE Transactions on Software Engineering, 2017, 44( 5): 412– 428

[19]	Pascarella L , Palomba F , Bacchelli A . Fine-grained just-in-time defect prediction. Journal of Systems and Software, 2019, 150 : 22– 36

[20]	Chen X , Zhao Y , Wang Q , Yuan Z . MULTI: multi-objective effortaware just-in-time software defect prediction. Information and Software Technology, 2018, 93 : 1– 13

[21]	Cabral G G, Minku L L, Shihab E, Mujahid S. Class imbalance evolution and verification latency in just-in-time software defect prediction. In: Proceedings of the 41st IEEE/ACM International Conference on Software Engineering. 2019, 666– 676

[22]	Li S Z, Fu Q, Gu L, Scholkopf B, Cheng Y, Zhang H. Kernel machine based learning for multi-view face detection and pose estimation. In: Proceedings of the 8th IEEE International Conference on Computer Vision. 2001, 674– 679

[23]	Xu Z, Liu J, Luo X, Zhang T. Cross-version defect prediction via hybrid active learning with kernel principal component analysis. In: Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering. 2018, 209– 220

[24]	Huang J , Yan X . Relevant and independent multi-block approach for plant-wide process and quality-related monitoring based on KPCA and SVDD. ISA Transactions, 2018, 73 : 257– 267

[25]	Xu Z , Liu J , Luo X , Yang Z , Zhang Y , Yuan P , Zhang T . Software defect prediction based on kernel PCA and weighted extreme learning machine. Information and Software Technology, 2019, 106 : 182– 200

[26]	Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Bengio Y. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing systems, 2014, 2672−2680

[27]	Li W , Ding W , Sadasivam R , Cui X , Chen P . His-GAN: a histogrambased GAN model to improve data generation quality. Neural Networks, 2019, 119 : 31– 45

[28]	Xu Z , Li S , Luo X , Liu J , Zhang T , Tang Y , Xu J , Yuan P . TSTSS: a two-stage training subset selection framework for cross version defect prediction. Journal of Systems and Software, 2019, 154 : 59– 78

[29]	Arisholm E , Briand L C , Johannessen E B . A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software, 2010, 83( 1): 2– 17

[30]	Xu Z , Li L , Yan M , Liu J , Luo X , Grundy J , Zhang Y , Zhang X . A comprehensive comparative study of clustering-based unsupervised defect prediction models. Journal of Systems and Software, 2021, 172 : 110862–

[31]	Huang Q , Xia X , Lo D . Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction. Empirical Software Engineering, 2019, 24( 5): 2823– 2862

[32]	Breiman L . Random forests. Machine Learning, 2001, 45( 1): 5– 32

[33]	Tantithamthavorn C , Hassan A E , Matsumoto K . The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Transactions on Software Engineering, 2018, 46( 11): 1200– 1219

[34]	Yang X , Lo D , Xia X , Sun J . TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology, 2017, 87 : 206– 220

[35]	Demšar J . Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 2006, 7 : 1– 30

[36]	Turhan B , Menzies T , Bener A B , Di Stefano J . On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 2009, 14( 5): 540– 578

[37]	Peters F, Menzies T, Marcus A. Better cross company defect predictio. In: Proceedings of the 10th Working Conference on Mining Software Repositories. 2013, 409– 418

[38]	Kawata K, Amasaki S, Yokogawa T. Improving relevancy filter methods for cross-project defect prediction. In: Proceedings of the 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence. 2015, 2– 7

[39]	Yu X, Zhou P, Zhang J, Liu J. A data filtering method based on agglomerative clustering. In: Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering. 2017, 392– 397

[40]	He P, Li B, Ma Y. Towards cross-project defect prediction with imbalanced feature sets. 2014, arXiv preprint arXiv: 2014.4228

[41]	He Z , Shu F , Yang Y , Li M , Wang Q . An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering, 2012, 19( 2): 167– 199

[42]	Pan S J , Tsang I W , Kwok J T , Yang Q . Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 2010, 22( 2): 199– 210

[43]	Long M, Wang J, Ding G, Sun J, Yu P S. Transfer feature learning with joint distribution adaptation. In: Proceedings of the IEEE International Conference on Computer Vision. 2013, 2200−2207

[44]	Panichella A, Oliveto R, De Lucia A. Cross-project defect prediction models: L’union fait la force. In: Proceedings of the 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering. 2014, 164– 173

[45]	Petrić J, Bowes D, Hall T, Christianson B, Baddoo N. Building an ensemble for software defect prediction based on diversity selection. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 2016, 1– 10

[46]	Zhang Y, Lo D, Xia X, Sun J. An empirical study of classifier combination for cross-project defect prediction. In: Proceedings of the 39th IEEE Annual Computer Software and Applications Conference. 2015, 264– 269

[47]	Di Nucci D, Palomba F, De Lucia A. Evaluating the adaptive selection of classifiers for cross-project bug prediction. In: Proceedings of the 6th IEEE/ACM International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering. 2018, 48– 54