TY - JOUR
T1 - Insight to the prediction of CO2 solubility in ionic liquids based on the interpretable machine learning model
AU - Yang, Ao
AU - Sun, Shirui
AU - Su, Yang
AU - Kong, Zong Yang
AU - Ren, Jingzheng
AU - Shen, Weifeng
N1 - Funding Information:
This work is supported by the National Natural Science Foundation of China (22308037, 22278044), Natural Science Foundation of Chongqing, China (Grant No. CSTB2022NSCQ-MSX0655), the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN202201516), Chongqing Special Support Fund for Post Doctor (Grant No. 2022CQBSHTB3047), the Chongqing Science Fund for Distinguished Young Scholars (No.CSTB2022NSCQ-JQX0021), the Chongqing Innovation Support Key Program for Returned Overseas Chinese Scholars(cx2023002), a grant from Departmental General Research Fund. (Grant No. G-UARF, Project ID: P0045761), and a grant from Research Institute for Advanced Manufacturing (RIAM), The Hong Kong Polytechnic University (1-CD9G, Project ID: P0046135).
Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/9/5
Y1 - 2024/9/5
N2 - In this work, we investigated three different machine learning (ML)-based models, i.e., gaussian process regression (GPR), LightGBM, and CatBoost, for predicting the solubility of CO2 in various ionic liquids (ILs). Three molecular descriptors, i.e., group contribution (GC), molecular structure descriptors (MSD), and hybrid GC-MSD are used in our three models. The performance of our developed models were rigorously evaluated using mean absolute error (MAE), coefficient of determination (R2), and mean relative error (MRE) (i.e., relative deviation in percentage), with each model subjected to multiple tests employing different random state parameters. The dataset underwent partitioning into training and testing sets at an 80:20 ratio, with additional splits at various ratios to assess prediction performance sensitivity. Overall, all models exhibited proficient CO2 solubility prediction in ILs, with performance varying based on descriptor type. Notably, the hybrid GC-MSD consistently outperformed others, attributed to GC-MSD incorporates a broader array of molecular feature information. Particularly, the CatBoost-GC-MSD model excelled, achieving an impressive R2 of 0.9925, MAE of 0.0122, and MRE of 11.1550%. Comparing our models to previous studies revealed the superior performance of CatBoost-GC-MSD across all descriptor types. Furthermore, our model interpretation, employing shapley additive explanation (SHAP) analysis, identified pressure, temperature, Chi0, Kappa2, and EState_VSA10 as the top five influential input features. These findings provide valuable insights into the underlying molecular features affecting CO2 solubility in ILs and lay the foundation for future research in this field.
AB - In this work, we investigated three different machine learning (ML)-based models, i.e., gaussian process regression (GPR), LightGBM, and CatBoost, for predicting the solubility of CO2 in various ionic liquids (ILs). Three molecular descriptors, i.e., group contribution (GC), molecular structure descriptors (MSD), and hybrid GC-MSD are used in our three models. The performance of our developed models were rigorously evaluated using mean absolute error (MAE), coefficient of determination (R2), and mean relative error (MRE) (i.e., relative deviation in percentage), with each model subjected to multiple tests employing different random state parameters. The dataset underwent partitioning into training and testing sets at an 80:20 ratio, with additional splits at various ratios to assess prediction performance sensitivity. Overall, all models exhibited proficient CO2 solubility prediction in ILs, with performance varying based on descriptor type. Notably, the hybrid GC-MSD consistently outperformed others, attributed to GC-MSD incorporates a broader array of molecular feature information. Particularly, the CatBoost-GC-MSD model excelled, achieving an impressive R2 of 0.9925, MAE of 0.0122, and MRE of 11.1550%. Comparing our models to previous studies revealed the superior performance of CatBoost-GC-MSD across all descriptor types. Furthermore, our model interpretation, employing shapley additive explanation (SHAP) analysis, identified pressure, temperature, Chi0, Kappa2, and EState_VSA10 as the top five influential input features. These findings provide valuable insights into the underlying molecular features affecting CO2 solubility in ILs and lay the foundation for future research in this field.
KW - CO2 capture
KW - Interpretation model
KW - Ionic liquids
KW - Machine learning
KW - QSPR
UR - http://www.scopus.com/inward/record.url?scp=85194159422&partnerID=8YFLogxK
U2 - 10.1016/j.ces.2024.120266
DO - 10.1016/j.ces.2024.120266
M3 - Journal article
AN - SCOPUS:85194159422
SN - 0009-2509
VL - 297
JO - Chemical Engineering Science
JF - Chemical Engineering Science
M1 - 120266
ER -