TY - JOUR
T1 - Multi-kernel feature extraction with dynamic fusion and downsampled residual feature embedding for predicting rice RNA N6-methyladenine sites
AU - Liu, Mengya
AU - Sun, Zhan Li
AU - Zeng, Zhigang
AU - Lam, Kin Man
N1 - Publisher Copyright:
© The Author(s) 2024. Published by Oxford University Press.
PY - 2024/12
Y1 - 2024/12
N2 - RNA N6-methyladenosine (m6A) is a critical epigenetic modification closely related to rice growth, development, and stress response. m6A accurate identification, directly related to precision rice breeding and improvement, is fundamental to revealing phenotype regulatory and molecular mechanisms. Faced on rice m6A variable-length sequence, to input into the model, the maximum length padding and label encoding usually adapt to obtain the max-length padded sequence for prediction. Although this can retain complete sequence information, resulting in sparse information and invalid padding, reducing feature extraction accuracy. Simultaneously, existing rice-specific m6A prediction methods are still at an early stage. To address these issues, we develop a new end-to-end deep learning framework, MFDm6ARice, for predicting rice m6A sites. In particular, to alleviate sparseness, we construct a multi-kernel feature fusion module to mine essential information in max-length padded sequences by multi-kernel feature extraction function and effectively transfer information through global–local dynamic fusion function. Concurrently, considering the complexity and computational efficiency of high-dimensional features caused by invalid padding, we design a downsampling residual feature embedding module to optimize feature space compression and achieve accurate feature expression and efficient computational performance. Experiments show that MFDm6ARice outperforms comparison methods in cross-validation, same- and cross-species independent test sets, demonstrating good robustness and generalization. The application on maize m6A indicates the MFDm6ARice’s scalability. Further investigations have shown that combining different kernel features, focusing on global channel-local spatial, and employing reasonable downsampling and residual connections can improve feature representation and extraction, ensure effective information transfer, and significantly enhance model performance.
AB - RNA N6-methyladenosine (m6A) is a critical epigenetic modification closely related to rice growth, development, and stress response. m6A accurate identification, directly related to precision rice breeding and improvement, is fundamental to revealing phenotype regulatory and molecular mechanisms. Faced on rice m6A variable-length sequence, to input into the model, the maximum length padding and label encoding usually adapt to obtain the max-length padded sequence for prediction. Although this can retain complete sequence information, resulting in sparse information and invalid padding, reducing feature extraction accuracy. Simultaneously, existing rice-specific m6A prediction methods are still at an early stage. To address these issues, we develop a new end-to-end deep learning framework, MFDm6ARice, for predicting rice m6A sites. In particular, to alleviate sparseness, we construct a multi-kernel feature fusion module to mine essential information in max-length padded sequences by multi-kernel feature extraction function and effectively transfer information through global–local dynamic fusion function. Concurrently, considering the complexity and computational efficiency of high-dimensional features caused by invalid padding, we design a downsampling residual feature embedding module to optimize feature space compression and achieve accurate feature expression and efficient computational performance. Experiments show that MFDm6ARice outperforms comparison methods in cross-validation, same- and cross-species independent test sets, demonstrating good robustness and generalization. The application on maize m6A indicates the MFDm6ARice’s scalability. Further investigations have shown that combining different kernel features, focusing on global channel-local spatial, and employing reasonable downsampling and residual connections can improve feature representation and extraction, ensure effective information transfer, and significantly enhance model performance.
KW - downsampling residual embedding
KW - global–local dynamic fusion
KW - multi-kernel feature
KW - rice genome
KW - RNA N-methyladenine
UR - http://www.scopus.com/inward/record.url?scp=85212797792&partnerID=8YFLogxK
U2 - 10.1093/bib/bbae647
DO - 10.1093/bib/bbae647
M3 - Journal article
C2 - 39674264
AN - SCOPUS:85212797792
SN - 1467-5463
VL - 26
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 1
M1 - bbae647
ER -