Bridging machine learning and COSMO-SAC for accurate prediction of infinite dilute activity coefficients of binary mixtures

Yuxin Qiu , Guzhong Chen , Qian Liu , Zhiwen Qi , Kake Zhu , Zhen Song

ENG. Chem. Eng. ›› 2026, Vol. 20 ›› Issue (1) : 4

PDF (3580KB)
ENG. Chem. Eng. ›› 2026, Vol. 20 ›› Issue (1) :4 DOI: 10.1007/s11705-026-2625-y
RESEARCH ARTICLE

Bridging machine learning and COSMO-SAC for accurate prediction of infinite dilute activity coefficients of binary mixtures

Author information +
History +
PDF (3580KB)

Abstract

Infinite dilution activity coefficient (γ) is a key thermodynamic parameter in solvent design for chemical processes. Although conductor-like screening model for segment activity coefficient (COSMO-SAC) exhibits strong prior predictive capabilities, its estimations are sometimes only qualitative rather than quantitative. Another limitation of COSMO-SAC arises from the reliance on time-intensive quantum chemistry calculations, which restricts its scalability for large-scale solvent screening. To overcome these issues, this study integrates COSMO-SAC with machine learning for accurate γ prediction of binary mixtures. By bypassing the necessity for quantum chemistry calculations, the multi-task machine learning model could rapidly predict the surface charge density distribution (σ-profiles) and molecular cavity volume (VCOSMO) of molecules and ions, while accurately distinguishing isomers. Four adjustable parameters of COSMO-SAC are optimized using more than 20000 experimental data points of γ, and residual systematic errors are further corrected with the boosting ensemble strategy to improve the model performance. The resulting hybrid model reduces the mean absolute error from 0.944 to 0.102 (R2  =  0.969), representing an 89 % improvement, while preserving the physicochemical interpretability of model. This accurate and efficient approach broadens the practical applicability of σ-profiles and VCOSMO prediction, as well as γ calculations based on COSMO-SAC, facilitating the high-throughput solvent screening for diverse chemical engineering applications.

Graphical abstract

Keywords

infinite dilution activity coefficient / COSMO-SAC / machine learning / solvent design / multi-task learning

Cite this article

Download citation ▾
Yuxin Qiu, Guzhong Chen, Qian Liu, Zhiwen Qi, Kake Zhu, Zhen Song. Bridging machine learning and COSMO-SAC for accurate prediction of infinite dilute activity coefficients of binary mixtures. ENG. Chem. Eng., 2026, 20(1): 4 DOI:10.1007/s11705-026-2625-y

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Hessel V , Tran N N , Asrami M R , Tran Q D , Van Duc Long N , Escribà-Gelonch M , Tejada J O , Linke S , Sundmacher K . Sustainability of green solvents—review and perspective. Green Chemistry, 2022, 24(2): 410–437

[2]

Bei P , Rajendran A , Feng J , Li W Y . Deciphering the intermolecular interactions for separating bicyclic and tricyclic aromatics via different naphthalene-based solvents. Frontiers of Chemical Science and Engineering, 2024, 18(10): 111

[3]

Xie K , Chen J , Cheng J , Wang R , Cheng H , Qi Z , Zhu K , Song Z . Enhancing aromatics extraction by double salt ionic liquids: rational screening-validation and mechanistic insights. AIChE Journal, 2024, 70(2): e18301

[4]

Cheng J , Qiu Y , Chen J , Gu Y , Wang J , Chen G , Qi Z , Song Z . Rational design and experimental evaluation of novel amino acid-based natural deep eutectic solvents for CO2 capture. Separation and Purification Technology, 2025, 361: 131554

[5]

Gani R , Brignole E A . Molecular design of solvents for liquid extraction based on UNIFAC. Fluid Phase Equilibria, 1983, 13: 331–340

[6]

Song Z , Chen J , Cheng J , Chen G , Qi Z . Computer-aided molecular design of ionic liquids as advanced process media: a review from fundamentals to applications. Chemical Reviews, 2024, 124(2): 248–317

[7]

Fingerhut R , Chen W L , Schedemann A , Cordes W , Rarey J , Hsieh C M , Vrabec J , Lin S T . Comprehensive assessment of COSMO-SAC Models for predictions of fluid-phase equilibria. Industrial & Engineering Chemistry Research, 2017, 56(35): 9868–9884

[8]

Focke W W , Endres S , du Toit E L , Loots M T , Coetzer R L J . Revisiting the classic activity coefficient models. Industrial & Engineering Chemistry Research, 2021, 60(15): 5639–5650

[9]

Ng L Y , Chong F K , Chemmangattuvalappil N G . Challenges and opportunities in computer-aided molecular design. Computers & Chemical Engineering, 2015, 81: 115–129

[10]

Li J , Anderson J L , Smith E A . Determination of infinite dilution activity coefficients of molecular solutes in ionic liquids and deep eutectic solvents by factorization-machine-based neural networks. ACS Sustainable Chemistry & Engineering, 2022, 10(42): 13927–13935

[11]

Stavrou M , Lampe M , Bardow A , Gross J . Continuous molecular targeting—computer-aided molecular design (CoMT-CAMD) for simultaneous process and solvent design for CO2 capture. Industrial & Engineering Chemistry Research, 2014, 53(46): 18029–18041

[12]

Fredenslund A , Jones R L , Prausnitz J M . Group-contribution estimation of activity coefficients in nonideal liquid mixtures. AIChE Journal, 1975, 21(6): 1086–1099

[13]

Weidlich U , Gmehling J . A modified UNIFAC model. 1. Prediction of VLE, hE, and gamma.infin. Industrial & Engineering Chemistry Research, 1987, 26(7): 1372–1381

[14]

Chen G , Song Z , Qi Z . Transformer-convolutional neural network for surface charge density profile prediction: enabling high-throughput solvent screening with COSMO-SAC. Chemical Engineering Science, 2021, 246: 117002

[15]

Klamt A . Conductor-like screening model for real solvents: a new approach to the quantitative calculation of solvation phenomena. Journal of Physical Chemistry, 1995, 99(7): 2224–2235

[16]

Lin S T , Sandler S I . A Priori phase equilibrium prediction from a segment contribution solvation model. Industrial & Engineering Chemistry Research, 2002, 41(5): 899–913

[17]

Abranches D O , Maginn E J , Colón Y J . Stochastic machine learning via sigma profiles to build a digital chemical space. Proceedings of the National Academy of Sciences of the United States of America, 2024, 121(31): e2404676121

[18]

Fu Y , Mu W , Bai X , Zhang X , Dai C , Chen B , Yu G . Prediction of the solubility of fluorinated gases in ionic liquids by machine learning with COSMO-RS-based descriptors. Separation and Purification Technology, 2025, 364: 132413

[19]

Mullins E , Oldland R , Liu Y A , Wang S , Sandler S I , Chen C C , Zwolak M , Seavey K C . Sigma-profile database for using COSMO-based thermodynamic methods. Industrial & Engineering Chemistry Research, 2006, 45(12): 4389–4415

[20]

Mullins E , Liu Y A , Ghaderi A , Fast S D . Sigma profile database for predicting solid solubility in pure and mixed solvent mixtures for organic pharmacological compounds with COSMO-based thermodynamic methods. Industrial & Engineering Chemistry Research, 2008, 47(5): 1707–1725

[21]

Qin H , Wang Z , Zhou T , Song Z . Comprehensive evaluation of COSMO-RS for predicting ternary and binary ionic liquid-containing vapor-liquid equilibria. Industrial & Engineering Chemistry Research, 2021, 60(48): 17761–17777

[22]

Mu T , Rarey J , Gmehling J . Group contribution prediction of surface charge density distribution of molecules for COSMO-SAC. AIChE Journal, 2009, 55(12): 3298–3300

[23]

Allen A E A , Tkatchenko A . Machine learning of material properties: predictive and interpretable multilinear models. Science Advances, 2022, 8(18): eabm7185

[24]

Deng J , Yang Z , Wang H , Ojima I , Samaras D , Wang F . A systematic study of key elements underlying molecular property prediction. Nature Communications, 2023, 14(1): 6395

[25]

Li Y , Sun Q , Zhu Z , Wen H , Jin S , Zhang X , Lei Z , Shen W . Multi-scale revolution of artificial intelligence in chemical industry. Frontiers of Chemical Science and Engineering, 2025, 19(7): 57

[26]

Zhang J , Wang Q , Shen W . Message-passing neural network based multi-task deep-learning framework for COSMO-SAC based σ-profile and VCOSMO prediction. Chemical Engineering Science, 2022, 254: 117624

[27]

Yan J , Zhang Z , Meng M , Li J , Sun L . Insights into deep learning framework for molecular property prediction based on different tokenization algorithms. Chemical Engineering Science, 2024, 285: 119471

[28]

Ryu B C , Hwang S Y , Kang S S , Kang J W , Yang D . Group contribution based graph convolution network: predicting vapor-liquid equilibrium with COSMO-SAC-ML. International Journal of Thermophysics, 2023, 44(4): 49

[29]

Ferrarini F , Flôres G B , Muniz A R , de Soares R P . An open and extensible sigma-profile database for COSMO-based models. AIChE Journal, 2018, 64(9): 3443–3455

[30]

Tan T , Cheng H , Chen G , Song Z , Qi Z . Prediction of infinite-dilution activity coefficients with neural collaborative filtering. AIChE Journal, 2022, 68(9): e17789

[31]

Wu D , Zhu Z , Zhang J , Wen H , Jin S , Shen W . An interpretable solute-solvent interactive attention module intensified graph-learning architecture toward enhancing the prediction accuracy of an infinite dilution activity coefficient. Industrial & Engineering Chemistry Research, 2024, 63(19): 8741–8750

[32]

Hsieh C M , Sandler S I , Lin S T . Improvements of COSMO-SAC for vapor-liquid and liquid-liquid equilibrium predictions. Fluid Phase Equilibria, 2010, 297(1): 90–97

[33]

Hsieh C M , Lin S T , Vrabec J . Considering the dispersive interactions in the COSMO-SAC model for more accurate predictions of fluid phase behavior. Fluid Phase Equilibria, 2014, 367: 109–116

[34]

Bell I H , Mickoleit E , Hsieh C M , Lin S T , Vrabec J , Breitkopf C , Jäger A . A benchmark open-source implementation of COSMO-SAC. Journal of Chemical Theory and Computation, 2020, 16(4): 2635–2646

[35]

Akiba T , Sano S , Yanase T , Ohta T , Koyama M . Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: Association for Computing Machinery, 2019, 2623–2631

[36]

Watanabe S . Tree-structured Parzen estimator: understanding its algorithm components and their roles for better empirical performance. arXiv e-prints, 2023, arXiv:2304.11127

[37]

Zhang Y , Yang Q . An overview of multi-task learning. National Science Review, 2018, 5(1): 30–43

[38]

Karpov P , Godin G , Tetko I V . Transformer-CNN: Swiss knife for QSAR modeling and interpretation. Journal of Cheminformatics, 2020, 12(1): 17

[39]

Chen G , Song Z , Qi Z , Sundmacher K . Generalizing property prediction of ionic liquids from limited labeled data: a one-stop framework empowered by transfer learning. Digital Discovery, 2023, 2(3): 591–601

[40]

Qiu Y , Song Z , Chen G , Chen W , Chen L , Zhu K , Qi Z , Duan X , Chen D . Large chemical language models for property prediction and high-throughput screening of ionic liquids. Digital Discovery, 2025, 4(6): 1505–1517

[41]

Cipolla RGal YKendall A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018, 7482–7491

[42]

Chen TGuestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2016, 785–794

[43]

Breiman L . Random forests. Machine Learning, 2001, 45(1): 5–32

[44]

Lundberg S , Lee S I . A unified approach to interpreting model predictions. arXiv e-prints, 2017, arXiv:1705.07874

[45]

Winter B , Winter C , Schilling J , Bardow A . A smile is all you need: predicting limiting activity coefficients from smiles with natural language processing. Digital Discovery, 2022, 1(6): 859–869

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (3580KB)

Supplementary files

Supplementary materials

307

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/