TY - JOUR
T1 - CircRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier
AU - Li, Hui
AU - Deng, Zhaohong
AU - Yang, Haitao
AU - Pan, Xiaoyong
AU - Shen, Hong Bin
AU - Choi, Kup Sze
AU - Wang, Lei
AU - Wang, Shitong
AU - Wu, Jing
N1 - Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Circular RNAs (circRNAs) generally bind to RNA-binding proteins (RBPs) to play an important role in the regulation of autoimmune diseases. Thus, it is crucial to study the binding sites of RBPs on circRNAs. Although many methods, including traditional machine learning and deep learning, have been developed to predict the interactions between RNAs and RBPs, and most of them are focused on linear RNAs. At present, few studies have been done on the binding relationships between circRNAs and RBPs. Thus, in-depth research is urgently needed. In the existing circRNA-RBP binding site prediction methods, circRNA sequences are the main research subjects, but the relevant characteristics of circRNAs have not been fully exploited, such as the structure and composition information of circRNA sequences. Some methods have extracted different views to construct recognition models, but how to efficiently use the multi-view data to construct recognition models is still not well studied. Considering the above problems, this paper proposes a multi-view classification method called DMSK based on multi-view deep learning, subspace learning and multi-view classifier for the identification of circRNA-RBP interaction sites. In the DMSK method, first, we converted circRNA sequences into pseudo-amino acid sequences and pseudo-dipeptide components for extracting high-dimensional sequence features and component features of circRNAs, respectively. Then, the structure prediction method RNAfold was used to predict the secondary structure of the RNA sequences, and the sequence embedding model was used to extract the context-dependent features. Next, we fed the above four views' raw features to a hybrid network, which is composed of a convolutional neural network and a long short-term memory network, to obtain the deep features of circRNAs. Furthermore, we used view-weighted generalized canonical correlation analysis to extract four views' common features by subspace learning. Finally, the learned subspace common features and multi-view deep features were fed to train the downstream multi-view TSK fuzzy system to construct a fuzzy rule and fuzzy inference-based multi-view classifier. The trained classifier was used to predict the specific positions of the RBP binding sites on the circRNAs. The experiments show that the prediction performance of the proposed method DMSK has been improved compared with the existing methods. The code and dataset of this study are available at https://github.com/Rebecca3150/DMSK.
AB - Circular RNAs (circRNAs) generally bind to RNA-binding proteins (RBPs) to play an important role in the regulation of autoimmune diseases. Thus, it is crucial to study the binding sites of RBPs on circRNAs. Although many methods, including traditional machine learning and deep learning, have been developed to predict the interactions between RNAs and RBPs, and most of them are focused on linear RNAs. At present, few studies have been done on the binding relationships between circRNAs and RBPs. Thus, in-depth research is urgently needed. In the existing circRNA-RBP binding site prediction methods, circRNA sequences are the main research subjects, but the relevant characteristics of circRNAs have not been fully exploited, such as the structure and composition information of circRNA sequences. Some methods have extracted different views to construct recognition models, but how to efficiently use the multi-view data to construct recognition models is still not well studied. Considering the above problems, this paper proposes a multi-view classification method called DMSK based on multi-view deep learning, subspace learning and multi-view classifier for the identification of circRNA-RBP interaction sites. In the DMSK method, first, we converted circRNA sequences into pseudo-amino acid sequences and pseudo-dipeptide components for extracting high-dimensional sequence features and component features of circRNAs, respectively. Then, the structure prediction method RNAfold was used to predict the secondary structure of the RNA sequences, and the sequence embedding model was used to extract the context-dependent features. Next, we fed the above four views' raw features to a hybrid network, which is composed of a convolutional neural network and a long short-term memory network, to obtain the deep features of circRNAs. Furthermore, we used view-weighted generalized canonical correlation analysis to extract four views' common features by subspace learning. Finally, the learned subspace common features and multi-view deep features were fed to train the downstream multi-view TSK fuzzy system to construct a fuzzy rule and fuzzy inference-based multi-view classifier. The trained classifier was used to predict the specific positions of the RBP binding sites on the circRNAs. The experiments show that the prediction performance of the proposed method DMSK has been improved compared with the existing methods. The code and dataset of this study are available at https://github.com/Rebecca3150/DMSK.
KW - circRNA-RBP binding site prediction
KW - deep feature learning
KW - multi-view TSK fuzzy system
KW - WGCCA
UR - http://www.scopus.com/inward/record.url?scp=85120737760&partnerID=8YFLogxK
U2 - 10.1093/bib/bbab394
DO - 10.1093/bib/bbab394
M3 - Journal article
C2 - 34571539
AN - SCOPUS:85120737760
SN - 1467-5463
VL - 23
JO - Briefings in Bioinformatics
JF - Briefings in Bioinformatics
IS - 1
M1 - bbab394
ER -