TY - JOUR
T1 - Towards automatically generating block comments for code snippets
AU - Huang, Yuan
AU - Huang, Shaohao
AU - Chen, Huanchao
AU - Chen, Xiangping
AU - Zheng, Zibin
AU - Luo, Xiapu
AU - Jia, Nan
AU - Hu, Xinyu
AU - Zhou, Xiaocong
N1 - Funding Information:
This research is supported by the Key-Area Research and Development Program of Guangdong Province (2020B010164002), National Natural Science Foundation of China (61902441, 61672545, 61722214), Hong Kong RGC Projects (No. 152223/17E, 152239/18E), Guangdong Basic and Applied Basic Research Foundation (2020A1515010973), China Postdoctoral Science Foundation (2018M640855), Fundamental Research Funds for the Central Universities (20wkpy06, 20lgpy129). Xiangping Chen is the corresponding author.
Funding Information:
This research is supported by the Key-Area Research and Development Program of Guangdong Province (2020B010164002), National Natural Science Foundation of China ( 61902441 , 61672545 , 61722214 ), Hong Kong RGC Projects (No. 152223/17E , 152239/18E ), Guangdong Basic and Applied Basic Research Foundation (2020A1515010973), China Postdoctoral Science Foundation ( 2018M640855 ), Fundamental Research Funds for the Central Universities ( 20wkpy06 , 20lgpy129 ). Xiangping Chen is the corresponding author.
Publisher Copyright:
© 2020
PY - 2020/11
Y1 - 2020/11
N2 - Code commenting is a common programming practice of practical importance to help developers review and comprehend source code. There are two main types of code comments for a method: header comments that summarize the method functionality located before a method, and block comments that describe the functionality of the code snippets within a method. Inspired by the effectiveness of deep learning techniques in the NLP field, many studies focus on using the machine translation model to automatically generate comment for the source code. Because the data set of block comments is difficult to collect, current studies focus more on the automatic generation of header comments than that of block comments. However, block comments are important for program comprehension due to their explanation role for the code snippets in a method. To fill the gap, we have proposed an approach that combines heuristic rules and learning-based method to collect a large number of comment-code pairs from 1,032 open source projects in our previous study. In this paper, we propose a reinforcement learning-based method, RL-BlockCom, to automatically generate block comments for code snippets based on the collected comment-code pairs. Specifically, we utilize the abstract syntax tree (i.e., AST) of a code snippet to generate a token sequence with a statement-based traversal way. Then we propose a composite learning model, which combines the actor-critic algorithm of reinforcement learning with the encoder-decoder algorithm, to generate block comments. On the data set of the comment-code pairs, the BLEU-4 score of our method is 24.28, which outperforms the baselines and state-of-the-art in comment generation.
AB - Code commenting is a common programming practice of practical importance to help developers review and comprehend source code. There are two main types of code comments for a method: header comments that summarize the method functionality located before a method, and block comments that describe the functionality of the code snippets within a method. Inspired by the effectiveness of deep learning techniques in the NLP field, many studies focus on using the machine translation model to automatically generate comment for the source code. Because the data set of block comments is difficult to collect, current studies focus more on the automatic generation of header comments than that of block comments. However, block comments are important for program comprehension due to their explanation role for the code snippets in a method. To fill the gap, we have proposed an approach that combines heuristic rules and learning-based method to collect a large number of comment-code pairs from 1,032 open source projects in our previous study. In this paper, we propose a reinforcement learning-based method, RL-BlockCom, to automatically generate block comments for code snippets based on the collected comment-code pairs. Specifically, we utilize the abstract syntax tree (i.e., AST) of a code snippet to generate a token sequence with a statement-based traversal way. Then we propose a composite learning model, which combines the actor-critic algorithm of reinforcement learning with the encoder-decoder algorithm, to generate block comments. On the data set of the comment-code pairs, the BLEU-4 score of our method is 24.28, which outperforms the baselines and state-of-the-art in comment generation.
KW - Automatic comment generation
KW - Code comment scope
KW - Reinforcement learning
KW - Source code summarization
UR - http://www.scopus.com/inward/record.url?scp=85087525711&partnerID=8YFLogxK
U2 - 10.1016/j.infsof.2020.106373
DO - 10.1016/j.infsof.2020.106373
M3 - Journal article
AN - SCOPUS:85087525711
SN - 0950-5849
VL - 127
SP - 1
EP - 12
JO - Information and Software Technology
JF - Information and Software Technology
M1 - 106373
ER -