TY - JOUR
T1 - MGF6mARice: prediction of DNA N6-methyladenine sites in rice by exploiting molecular graph feature and residual block
AU - Liu, Mengya
AU - Sun, Zhan Li
AU - Zeng, Zhigang
AU - Lam, Kin Man
N1 - Funding Information:
This work was supported by a National Natural Science Foundation of China (No. 61972002).
Publisher Copyright:
© The Author(s) 2022. Published by Oxford University Press. All rights reserved.
PY - 2022/5/1
Y1 - 2022/5/1
N2 - DNA N6-methyladenine (6mA) is produced by the N6 position of the adenine being methylated, which occurs at the molecular level, and is involved in numerous vital biological processes in the rice genome. Given the shortcomings of biological experiments, researchers have developed many computational methods to predict 6mA sites and achieved good performance. However, the existing methods do not consider the occurrence mechanism of 6mA to extract features from the molecular structure. In this paper, a novel deep learning method is proposed by devising DNA molecular graph feature and residual block structure for 6mA sites prediction in rice, named MGF6mARice. Firstly, the DNA sequence is changed into a simplified molecular input line entry system (SMILES) format, which reflects chemical molecular structure. Secondly, for the molecular structure data, we construct the DNA molecular graph feature based on the principle of graph convolutional network. Then, the residual block is designed to extract higher level, distinguishable features from molecular graph features. Finally, the prediction module is used to obtain the result of whether it is a 6mA site. By means of 10-fold cross-validation, MGF6mARice outperforms the state-of-the-art approaches. Multiple experiments have shown that the molecular graph feature and residual block can promote the performance of MGF6mARice in 6mA prediction. To the best of our knowledge, it is the first time to derive a feature of DNA sequence by considering the chemical molecular structure. We hope that MGF6mARice will be helpful for researchers to analyze 6mA sites in rice.
AB - DNA N6-methyladenine (6mA) is produced by the N6 position of the adenine being methylated, which occurs at the molecular level, and is involved in numerous vital biological processes in the rice genome. Given the shortcomings of biological experiments, researchers have developed many computational methods to predict 6mA sites and achieved good performance. However, the existing methods do not consider the occurrence mechanism of 6mA to extract features from the molecular structure. In this paper, a novel deep learning method is proposed by devising DNA molecular graph feature and residual block structure for 6mA sites prediction in rice, named MGF6mARice. Firstly, the DNA sequence is changed into a simplified molecular input line entry system (SMILES) format, which reflects chemical molecular structure. Secondly, for the molecular structure data, we construct the DNA molecular graph feature based on the principle of graph convolutional network. Then, the residual block is designed to extract higher level, distinguishable features from molecular graph features. Finally, the prediction module is used to obtain the result of whether it is a 6mA site. By means of 10-fold cross-validation, MGF6mARice outperforms the state-of-the-art approaches. Multiple experiments have shown that the molecular graph feature and residual block can promote the performance of MGF6mARice in 6mA prediction. To the best of our knowledge, it is the first time to derive a feature of DNA sequence by considering the chemical molecular structure. We hope that MGF6mARice will be helpful for researchers to analyze 6mA sites in rice.
KW - DNA molecular graph feature
KW - DNA N6-methyladenine
KW - residual block
KW - rice genome
KW - SMILES
UR - http://www.scopus.com/inward/record.url?scp=85130767221&partnerID=8YFLogxK
U2 - 10.1093/bib/bbac082
DO - 10.1093/bib/bbac082
M3 - Journal article
C2 - 35325050
AN - SCOPUS:85130767221
SN - 1467-5463
VL - 23
SP - 1
EP - 15
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 3
M1 - bbac082
ER -