TY - JOUR
T1 - An improved multi-modal representation-learning model based on fusion networks for property prediction in drug discovery
AU - Wu, Jinzhou
AU - Su, Yang
AU - Yang, Ao
AU - Ren, Jingzheng
AU - Xiang, Yi
N1 - Funding Information:
This study was funded by the Scientific Research Foundation of Chongqing University of Science and Technology (Grant No. CKRC2020028 ); National Natural Science Foundation of China ( 22308037 ); the Natural Science Foundation of Chongqing, China (Grant No. CSTB2022NSCQ-MSX0655 ), the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN202201516 ) and the Foundation of Provincial-Ministerial Collaborative Innovation Center for Resource Treatment of Domestic Waste in Chongqing University of Science and Technology (Grant No. SHLJZYH2021-10 ).
Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/10
Y1 - 2023/10
N2 - Accurate characterization of molecular representations plays an important role in the property prediction based on deep learning (DL) for drug discovery. However, most previous researches considered only one type of molecular representations, resulting in that it difficult to capture the full molecular feature information. In this study, a novel DL framework called multi-modal molecular representation learning fusion network (MMRLFN) is developed, which could simultaneously learn and integrate drug molecular features from molecular graphs and SMILES sequences. The developed MMRLFN method is composed of three complementary deep neural networks to learn various features from different molecular representations, such as molecular topology, local chemical background information, and substructures at varying scales. Eight public datasets involving various molecular properties used in drug discovery were employed to train and evaluate the developed MMRLFN. The obtained models showed better performances than the existing models based on mono-modal molecular representations. Additionally, a thorough analysis of the noise resistance and interpretability of the MMRLFN has been carried out. The generalization ability and effectiveness of the MMRLFN has been verified by case studies as well. Overall, the MMRLFN can accurately predict molecular properties and provide potentially valuable information from large datasets, thereby maximizing the possibility of successful drug discovery.
AB - Accurate characterization of molecular representations plays an important role in the property prediction based on deep learning (DL) for drug discovery. However, most previous researches considered only one type of molecular representations, resulting in that it difficult to capture the full molecular feature information. In this study, a novel DL framework called multi-modal molecular representation learning fusion network (MMRLFN) is developed, which could simultaneously learn and integrate drug molecular features from molecular graphs and SMILES sequences. The developed MMRLFN method is composed of three complementary deep neural networks to learn various features from different molecular representations, such as molecular topology, local chemical background information, and substructures at varying scales. Eight public datasets involving various molecular properties used in drug discovery were employed to train and evaluate the developed MMRLFN. The obtained models showed better performances than the existing models based on mono-modal molecular representations. Additionally, a thorough analysis of the noise resistance and interpretability of the MMRLFN has been carried out. The generalization ability and effectiveness of the MMRLFN has been verified by case studies as well. Overall, the MMRLFN can accurately predict molecular properties and provide potentially valuable information from large datasets, thereby maximizing the possibility of successful drug discovery.
KW - Deep learning
KW - Drug discovery
KW - Feature fusion
KW - Multi-modal molecular representations
KW - Property prediction
UR - http://www.scopus.com/inward/record.url?scp=85169878892&partnerID=8YFLogxK
U2 - 10.1016/j.compbiomed.2023.107452
DO - 10.1016/j.compbiomed.2023.107452
M3 - Journal article
C2 - 37690287
AN - SCOPUS:85169878892
SN - 0010-4825
VL - 165
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 107452
ER -