Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt

Dongjin YU , Lin WANG , Xin CHEN , Jie CHEN

Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (4) : 154208

PDF (540KB)
Front. Comput. Sci. ›› 2021, Vol. 15 ›› Issue (4) : 154208 DOI: 10.1007/s11704-020-9281-z
RESEARCH ARTICLE

Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt

Author information +
History +
PDF (540KB)

Abstract

Technical debt is a metaphor for seeking short-term gains at expense of long-term code quality. Previous studies have shown that self-admitted technical debt, which is introduced intentionally, has strong negative impacts on software development and incurs high maintenance overheads. To help developers identify self-admitted technical debt, researchers have proposed many state-of-the-art methods. However, there is still room for improvement about the effectiveness of the current methods, as self-admitted technical debt comments have the characteristics of length variability, low proportion and style diversity. Therefore, in this paper, we propose a novel approach based on the bidirectional long short-term memory (BiLSTM) networks with the attention mechanism to automatically detect self-admitted technical debt by leveraging source code comments. In BiLSTM, we utilize a balanced cross entropy loss function to overcome the class unbalance problem. We experimentally investigate the performance of our approach on a public dataset including 62, 566 code comments from ten open source projects. Experimental results show that our approach achieves 81.75% in terms of precision, 72.24% in terms of recall and 75.86% in terms of F1-score on average and outperforms the state-of-the-art text mining-based method by 8.14%, 5.49% and 6.64%, respectively.

Keywords

technical debt / self-admitted technical debt / long short-term memory / attention mechanism / natural language processing

Cite this article

Download citation ▾
Dongjin YU, Lin WANG, Xin CHEN, Jie CHEN. Using BiLSTM with attention mechanism to automatically detect self-admitted technical debt. Front. Comput. Sci., 2021, 15(4): 154208 DOI:10.1007/s11704-020-9281-z

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Mensah S, Keung J, Svajlenko J, Bennin K E, Mi Q. On the value of a prioritization scheme for resolving self-admitted technical debt. Journal of Systems and Software, 2018, 135: 37–54

[2]

Cunningham W. The WyCash portfolio management system. ACM SIGPLAN OOPS Messenger, 1992, 4(2): 29–30

[3]

Lim E, Taksande N, Seaman C. A balancing act: what software practitioners have to say about technical debt. IEEE Software, 2012, 29(6): 22–27

[4]

Yli-Huumo J, Maglyas A, Smolander K. How do software development teams manage technical debt? an empirical study. Journal of Systems and Software, 2016, 120: 195–218

[5]

Zazworka N, Shaw M A, Shull F, Seaman C. Investigating the impact of design debt on software quality. In: Proceedings of the 2nd Workshop on Managing Technical Debt. 2011, 17–23

[6]

Li Z, Avgeriou P, Liang P. A systematic mapping study on technical debt and its management. Journal of Systems and Software, 2015, 101: 193–220

[7]

Maldonado E S, Shihab E, Tsantalis N. Using natural language processing to automatically detect self-admitted technical debt. IEEE Transactions on Software Engineering, 2017, 43(11): 1044–1062

[8]

Huang Q, Shihab E, Xia X, Lo D, Li S. Identifying self-admitted technical debt in open source projects using text mining. Empirical Software Engineering, 2018, 23(1): 418–451

[9]

Potdar A, Shihab E. An exploratory study on self-admitted technical debt. In: Proceedings of 2014 IEEE International Conference on Software Maintenance and Evolution. 2014, 91–100

[10]

Maldonado E S, Shihab E. Detecting and quantifying different types of self-admitted technical debt. In: Proceedings of the 7th IEEE International Workshop on Managing Technical Debt. 2015, 9–15

[11]

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780

[12]

Yu R, Gao J, Yu M, Lu W, Xu T, Zhao M, Zhang J, Zhang R, Zhang Z. LSTM-EFG for wind power forecasting based on sequential correlation features. Future Generation Computer Systems, 2019, 93: 33–42

[13]

Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTMand other neural network architectures. Neural Networks, 2005, 18(5–6): 602–610

[14]

Zhang S, Zheng D, Hu X, Yang M. Bidirectional long short-term memory networks for relation classification. In: Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. 2015, 73–78

[15]

Liu G, Guo J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing, 2019, 337: 325–338

[16]

Zeng D, Liu K, Chen Y, Zhao J. Distant supervision for relation extraction via piecewise convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 1753–1762

[17]

Schnabel T, Labutov I, Mimno D, Joachims T. Evaluation methods for unsupervised word embeddings. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015, 298–307

[18]

Pennington J, Socher R, Manning C D. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1532–1543

[19]

Kotsiantis S, Kanellopoulos D, Pintelas P. Handling imbalanced datasets: a review. GESTS International Transactions on Computer Science and Engineering, 2006, 30(1): 25–36

[20]

Wasikowski M, Chen X. Combating the small sample class imbalance problem using feature selection. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(10): 1388–1400

[21]

Xie S, Tu Z. Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1395–1403

[22]

Bajpai P, Kumar M. Genetic algorithm–an approach to solve global optimization problems. Indian Journal of Computer Science and Engineering, 2010, 1(3): 199–206

[23]

Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015

[24]

Zampetti F, Noiseux C, Antoniol G, Khomh F, Di Penta M. Recommending when design technical debt should be self-admitted. In: Proceedings of 2017 IEEE International Conference on Software Maintenance and Evolution. 2017, 216–226

[25]

Liu Z, Huang Q, Xia X, Shihab E, Lo D, Li S. Satd detector: a textmining-based self-admitted technical debt detection tool. In: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 2018, 9–12

[26]

Lee M L, Ling T W, Low W L. IntelliClean: a knowledge-based intelligent data cleaner. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2000, 290–294

[27]

Gelfand A E. Model determination using sampling-based methods. Markov chain Monte Carlo in practice, 1996, 145–161

[28]

Jiang H, Zhang J, Li X, Ren Z, Lo D. A more accurate model for finding tutorial segments explaining APIs. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2016, 157–167

[29]

Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278–2324

[30]

Sierra G, Shihab E, Kamei Y. A survey of self-admitted technical debt. Journal of Systems and Software, 2019, 152: 70–82

[31]

Li Z, Avgeriou P, Liang P. A systematic mapping study on technical debt and its management. Journal of Systems and Software, 2015, 101: 193–220

[32]

Fontana F A, Ferme V, Spinelli S. Investigating the impact of code smells debt on quality code evaluation. In: Proceedings of the 3rd International Workshop on Managing Technical Debt. 2012, 15–22

[33]

Tom E, Aurum A K, Vidgen R. An exploration of technical debt. Journal of Systems and Software, 2013, 86(6): 1498–1516

[34]

Zazworka N, Spínola R O, Vetro’ A, Shull F, Seaman C. A case study on effectively identifying technical debt. In: Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering. 2013, 42–47

[35]

Alves N S R, Mendes T S, Mendonça M G, Spínola R O, Shull F, Seaman C. Identification and management of technical debt: a systematic mapping study. Information and Software Technology, 2016, 70: 100–121

[36]

Farias M A F, Mendonça M G, Silva A B, Spínola R O. A contextualized vocabulary model for identifying technical debt on code comments. In: Proceedings of the 7th IEEE International Workshop on Managing Technical Debt. 2015, 25–32

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (540KB)

Supplementary files

FCS-19281-DY-Highlights

1692

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/