Abstract
Existing techniques for automatic code commenting assume that the code snippet to be commented has been identified, thus requiring users to provide the code snippet in advance. A smarter commenting approach is desired to first self-determine where to comment in a given source code and then generate comments for the code snippets that need comments. To achieve the first step of this goal, we propose a novel method, CommtPst, to automatically find the appropriate commenting positions in the source code. Since commenting is closely related to the code syntax and semantics, we adopt neural language model (word embeddings) to capture the code semantic information, and analyze the abstract syntax trees to capture code syntactic information. Then, we employ LSTM (long short term memory) to model the long-term logical dependency of code statements over the fused semantic and syntactic information and learn the commenting patterns on the code sequence. We evaluated CommtPst using large data sets from dozens of open-source software systems in GitHub. The experimental results show that the precision, recall and F-Measure values achieved by CommtPst are 0.792, 0.602 and 0.684, respectively, which outperforms the traditional machine learning method with 11.4% improvement on F-measure.
Original language | English |
---|---|
Article number | 110754 |
Pages (from-to) | 1-14 |
Journal | Journal of Systems and Software |
Volume | 170 |
DOIs | |
Publication status | Published - Dec 2020 |
Keywords
- Code semantics
- Code syntax
- Comment generation
- Comment position
- LSTM
ASJC Scopus subject areas
- Software
- Information Systems
- Hardware and Architecture