CodeAttention: translating source code to comments by exploiting the code constructs

Wenhao ZHENG; Hongyu ZHOU; Ming LI; Jianxin WU

doi:10.1007/s11704-018-7457-6

PDF(522 KB)

Front. Comput. Sci. ›› 2019, Vol. 13 ›› Issue (3) : 565-578. DOI: 10.1007/s11704-018-7457-6

RESEARCH ARTICLE

CodeAttention: translating source code to comments by exploiting the code constructs

Author information +

History +

Abstract

Appropriate comments of code snippets provide insight for code functionality, which are helpful for program comprehension. However, due to the great cost of authoring with the comments, many code projects do not contain adequate comments. Automatic comment generation techniques have been proposed to generate comments from pieces of code in order to alleviate the human efforts in annotating the code.Most existing approaches attempt to exploit certain correlations (usually manually given) between code and generated comments, which could be easily violated if coding patterns change and hence the performance of comment generation declines. In addition, recent approaches ignore exploiting the code constructs and leveraging the code snippets like plain text. Furthermore, previous datasets are also too small to validate the methods and show their advantage. In this paper, we propose a new attention mechanism called CodeAttention to translate code to comments, which is able to utilize the code constructs, such as critical statements, symbols and keywords. By focusing on these specific points, CodeAttention could understand the semantic meanings of code better than previous methods. To verify our approach in wider coding patterns, we build a large dataset from open projects in GitHub. Experimental results in this large dataset demonstrate that the proposed method has better performance over existing approaches in both objective and subjective evaluation. We also perform ablation studies to determine effects of different parts in CodeAttention.

Keywords

software mining / machine learning / code comment generation / recurrent neural network / attention mechanism

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Wenhao ZHENG, Hongyu ZHOU, Ming LI, Jianxin WU. CodeAttention: translating source code to comments by exploiting the code constructs. Front. Comput. Sci., 2019, 13(3): 565‒578 https://doi.org/10.1007/s11704-018-7457-6

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Fluri B, Wursch M, Gall H C. Do code and comments co-evolve? on the relation between source code and comment changes. In: Preceedings of the 14th Working Conference on Reverse Engineering. 2007, 70–79

[2]	Sridhara G, Hill E, Muppaneni D, Pollock L, Vijay S K. Towards automatically generating summary comments for java methods. In: Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering. 2010, 43–52 CrossRef Google scholar

[3]	Rastkar S, Murphy G C, Murray G. Summarizing software artifacts: a case study of bug reports. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 505–514 CrossRef Google scholar

[4]	McBurney P W, McMillan C. Automatic documentation generation via source code summarization of method context. In: Preceedings of the 22nd International Conference on Program Comprehension. 2014, 279–290 CrossRef Google scholar

[5]	Sulír M, Porubän J. Generating method documentation using concrete values from executions. OASIcs-OpenAccess Series in Informatics, 2017, 56(3): 1–13

[6]	Srinivasan I, Ioannis K, Alvin C, Luke Z. Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, 2073–2083

[7]	Allamanis M, Peng H, Sutton C. A convolutional attention network for extreme summarization of source code. In: Proceedings of the 23rd International Conference on Machine Learning. 2016, 2091–2100

[8]	Huo X, Li M, Zhou Z H. Learning unified features from natural and programming languages for locating buggy source codes. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016, 1606–1612

[9]	Sridhara G, Pollock L, Vija y S K. Automatically detecting and describing high level actions within methods. In: Proceedings of the 33rd ACM/IEEE International Conference on Software Engineering. 2011, 101–110 CrossRef Google scholar

[10]

Movshovitz A D, Movshovitz A Y, Steenkiste P, Faloutsos C. Analysis of the reputation system and user contributions on a question answering website: stackoverflow. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2013, 886–893

CrossRef Google scholar

[11]	Haiduc S, Aponte J, Moreno L, Marcus A. On the use of automated text summarization techniques for summarizing source code. In: Preceedings of the 17th Working Conference on Reverse Engineering. 2010, 35–44 CrossRef Google scholar

[12]	Eddy B P, Robinson J A, Kraft N A, Carver J C. Evaluating source code summarization techniques: replication and expansion. In: Preceedings of the 21st International Conference on Program Comprehension. 2013, 13–22 CrossRef Google scholar

[13]	Rodeghero P, McMillan C, McBurney P W, Bosch N, DMello S. Improving automated source code summarization via an eye-tracking study of programmers. In: Proceedings of the 36th ACM/IEEE International Conference on Software Engineering. 2014, 390–401

[14]	Dyer R, Nguyen H A, Rajan H, Nguyen T N. Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the 35th International Conference on Software Engineering. 2013, 422–431 CrossRef Google scholar

[15]	Wong E, Yang J, Tan L. Autocomment: mining question and answer sites for automatic comment generation. In: Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering. 2013, 562–567 CrossRef Google scholar

[16]	Wong E, Liu T, Tan L. CloCom: mining existing source code for automatic comment generation. In: Proceedings of the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2015, 380–389 CrossRef Google scholar

[17]	Peter E B, Stephen A D P, Vincent J D P, Robert L M. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 1993, 19(2): 263–311

[18]	Koehn P, Och F J, Marcu D. Statistical phrase-based translation. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 2003, 48–54 CrossRef Google scholar

[19]	Hinton G, Deng L, Yu D, Dahl G, Mohamed A R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 2012, 29(6): 82–97 CrossRef Google scholar

[20]	Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing System, 2012, 1106–1114

[21]	Ilya S, Oriol V, Quoc V L. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 2014, 3104–3112

[22]	Yin X Y, Goudriaan J, Lantinga E A, Vos J, Spiertz H J. A flexible sigmoid function of determinate growth. Annals of Botany, 2003, 91(3): 361–371 CrossRef Google scholar

[23]	Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780 CrossRef Google scholar

[24]	Cho K, Merrienboer B V, Bahdanau D, Bengio Y. On the properties of neural machine translation: encoder-decoder approaches. 2014, arXiv preprint arXiv:1409.1259

[25]	Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. 2014, arXiv preprint arXiv:1412.3555

[26]	Cho K, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN Encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014, 1724–1734 CrossRef Google scholar

[27]	Oda Y, Fudaba H, Neubig G, Hata H, Sakti S, Toda T, Nakamura S. Learning to generate pseudo-code from source code using statistical machine translation. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering. 2015, 574–584 CrossRef Google scholar

[28]	Neamtiu I, Foster J S, Hicks M. Understanding source code evolution using abstract syntax tree matching. ACM SIGSOFT Software Engineering Notes, 2005, 30(4): 1–5 CrossRef Google scholar

[29]	Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. 2014, arXiv preprint arXiv:1409.0473

[30]	Koehn P, Hoang H, Birch A, Callison B C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R. Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007, 177–180 CrossRef Google scholar

[31]	Heafield K. Ken L M: faster and smaller language model queries. In: Proceedings of the 6th Workshop on Statistical Machine Translation. 2011, 187–197

[32]	Vinyals O, Kaiser L, Koo T, Petrov S, Sutskever I, Hinton G. Grammar as a foreign language. Advances in Neural Information Processing Systems, 2015, 2773–2781

[33]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I. Attention is all you need. Advances in Neural Information Processing Systems, 2017, 6000–6010

[34]	Papineni K, Roukos S, Ward T, Zhu W J. BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002, 311–318

[35]	Banerjee S, Lavie A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the Association for Computational Linguistics Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 2005, 65–72

[36]	Denkowski M, Lavie A. Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the 9th Workshop on Statistical Machine Translation. 2014, 376–380 CrossRef Google scholar

[37]	Stent A, Marge M, Singhai M. Evaluating evaluation methods for generation in the presence of variation. In: Proceedings of the 6th International Conference on Intelligent Text Processing and Computational Linguistics. 2005, 341–351 CrossRef Google scholar

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap

PDF(522 KB)

Accesses

Citations

Detail

Sections

Recommended

Received	Accepted	Published
29 Dec 2017	07 Jun 2018	15 Jun 2019
Just Accepted Date	Online First Date	Issue Date
21 Jun 2018	22 Oct 2018	24 Apr 2019

About the journal

Aims & scope

Description

Editorial board

Abstracting / Indexing

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Multimedia collections

Authors & reviewers

Online submisson

Call for papers

Guidelines for authors

Download templates