Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt
Dongjin YU, Lin WANG, Xin CHEN, Jie CHEN
Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt
Technical debt is a metaphor for seeking short-term gains at expense of long-term code quality. Previous studies have shown that self-admitted technical debt, which is introduced intentionally, has strong negative impacts on software development and incurs high maintenance overheads. To help developers identify self-admitted technical debt, researchers have proposed many state-of-the-art methods. However, there is still room for improvement about the effectiveness of the current methods, as self-admitted technical debt comments have the characteristics of length variability, low proportion and style diversity. Therefore, in this paper, we propose a novel approach based on the bidirectional long short-term memory (BiLSTM) networks with the attention mechanism to automatically detect self-admitted technical debt by leveraging source code comments. In BiLSTM, we utilize a balanced cross entropy loss function to overcome the class unbalance problem. We experimentally investigate the performance of our approach on a public dataset including 62, 566 code comments from ten open source projects. Experimental results show that our approach achieves 81.75% in terms of precision, 72.24% in terms of recall and 75.86% in terms of F1-score on average and outperforms the state-of-the-art text mining-based method by 8.14%, 5.49% and 6.64%, respectively.
technical debt / self-admitted technical debt / long short-term memory / attention mechanism / natural language processing
[1] |
Mensah S, Keung J, Svajlenko J, Bennin K E, Mi Q. On the value of a prioritization scheme for resolving self-admitted technical debt. Journal of Systems and Software, 2018, 135: 37–54
CrossRef
Google scholar
|
[2] |
Cunningham W. The WyCash portfolio management system. ACM SIGPLAN OOPS Messenger, 1992, 4(2): 29–30
CrossRef
Google scholar
|
[3] |
Lim E, Taksande N, Seaman C. A balancing act: what software practitioners have to say about technical debt. IEEE Software, 2012, 29(6): 22–27
CrossRef
Google scholar
|
[4] |
Yli-Huumo J, Maglyas A, Smolander K. How do software development teams manage technical debt? an empirical study. Journal of Systems and Software, 2016, 120: 195–218
CrossRef
Google scholar
|
[5] |
Zazworka N, Shaw M A, Shull F, Seaman C. Investigating the impact of design debt on software quality. In: Proceedings of the 2nd Workshop on Managing Technical Debt. 2011, 17–23
CrossRef
Google scholar
|
[6] |
Li Z, Avgeriou P, Liang P. A systematic mapping study on technical debt and its management. Journal of Systems and Software, 2015, 101: 193–220
CrossRef
Google scholar
|
[7] |
Maldonado E S, Shihab E, Tsantalis N. Using natural language processing to automatically detect self-admitted technical debt. IEEE Transactions on Software Engineering, 2017, 43(11): 1044–1062
CrossRef
Google scholar
|
[8] |
Huang Q, Shihab E, Xia X, Lo D, Li S. Identifying self-admitted technical debt in open source projects using text mining. Empirical Software Engineering, 2018, 23(1): 418–451
CrossRef
Google scholar
|
[9] |
Potdar A, Shihab E. An exploratory study on self-admitted technical debt. In: Proceedings of 2014 IEEE International Conference on Software Maintenance and Evolution. 2014, 91–100
CrossRef
Google scholar
|
[10] |
Maldonado E S, Shihab E. Detecting and quantifying different types of self-admitted technical debt. In: Proceedings of the 7th IEEE International Workshop on Managing Technical Debt. 2015, 9–15
CrossRef
Google scholar
|
[11] |
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780
CrossRef
Google scholar
|
[12] |
Yu R, Gao J, Yu M, Lu W, Xu T, Zhao M, Zhang J, Zhang R, Zhang Z. LSTM-EFG for wind power forecasting based on sequential correlation features. Future Generation Computer Systems, 2019, 93: 33–42
CrossRef
Google scholar
|
[13] |
Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTMand other neural network architectures. Neural Networks, 2005, 18(5–6): 602–610
CrossRef
Google scholar
|
[14] |
Zhang S, Zheng D, Hu X, Yang M. Bidirectional long short-term memory networks for relation classification. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. 2015, 73–78
|
[15] |
Liu G, Guo J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 2019, 337: 325–338
CrossRef
Google scholar
|
[16] |
Zeng D, Liu K, Chen Y, Zhao J. Distant supervision for relation extraction via piecewise convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 1753–1762
CrossRef
Google scholar
|
[17] |
Schnabel T, Labutov I, Mimno D, Joachims T. Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 298–307
CrossRef
Google scholar
|
[18] |
Pennington J, Socher R, Manning C D. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1532–1543
CrossRef
Google scholar
|
[19] |
Kotsiantis S, Kanellopoulos D, Pintelas P. Handling imbalanced datasets: a review. GESTS International Transactions on Computer Science and Engineering, 2006, 30(1): 25–36
|
[20] |
Wasikowski M, Chen X. Combating the small sample class imbalance problem using feature selection. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(10): 1388–1400
CrossRef
Google scholar
|
[21] |
Xie S, Tu Z. Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1395–1403
CrossRef
Google scholar
|
[22] |
Bajpai P, Kumar M. Genetic algorithm–an approach to solve global optimization problems. Indian Journal of Computer Science and Engineering, 2010, 1(3): 199–206
|
[23] |
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
|
[24] |
Zampetti F, Noiseux C, Antoniol G, Khomh F, Di Penta M. Recommending when design technical debt should be self-admitted. In: Proceedings of 2017 IEEE International Conference on Software Maintenance and Evolution. 2017, 216–226
CrossRef
Google scholar
|
[25] |
Liu Z, Huang Q, Xia X, Shihab E, Lo D, Li S. Satd detector: a textmining-based self-admitted technical debt detection tool. In: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 2018, 9–12
CrossRef
Google scholar
|
[26] |
Lee M L, Ling T W, Low W L. IntelliClean: a knowledge-based intelligent data cleaner. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2000, 290–294
CrossRef
Google scholar
|
[27] |
Gelfand A E. Model determination using sampling-based methods. Markov chain Monte Carlo in practice, 1996, 145–161
|
[28] |
Jiang H, Zhang J, Li X, Ren Z, Lo D. A more accurate model for finding tutorial segments explaining APIs. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2016, 157–167
CrossRef
Google scholar
|
[29] |
Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324
CrossRef
Google scholar
|
[30] |
Sierra G, Shihab E, Kamei Y. A survey of self-admitted technical debt. Journal of Systems and Software, 2019, 152: 70–82
CrossRef
Google scholar
|
[31] |
Li Z, Avgeriou P, Liang P. A systematic mapping study on technical debt and its management. Journal of Systems and Software, 2015, 101: 193–220
CrossRef
Google scholar
|
[32] |
Fontana F A, Ferme V, Spinelli S. Investigating the impact of code smells debt on quality code evaluation. In: Proceedings of the 3rd International Workshop on Managing Technical Debt. 2012, 15–22
CrossRef
Google scholar
|
[33] |
Tom E, Aurum A K, Vidgen R. An exploration of technical debt. Journal of Systems and Software, 2013, 86(6): 1498–1516
CrossRef
Google scholar
|
[34] |
Zazworka N, Spínola R O, Vetro’ A, Shull F, Seaman C. A case study on effectively identifying technical debt. In: Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering. 2013, 42–47
CrossRef
Google scholar
|
[35] |
Alves N S R, Mendes T S, Mendonça M G, Spínola R O, Shull F, Seaman C. Identification and management of technical debt: a systematic mapping study. Information and Software Technology, 2016, 70: 100–121
CrossRef
Google scholar
|
[36] |
Farias M A F, Mendonça M G, Silva A B, Spínola R O. A contextualized vocabulary model for identifying technical debt on code comments. In: Proceedings of the 7th IEEE International Workshop on Managing Technical Debt. 2015, 25–32
|
/
〈 | 〉 |