A new simple and efficient molecular descriptor for the fast and accurate prediction of log P

Xiaojian Zeng , Xin Ye , Donghua Liu , Ningyi Cui , Xiaopeng Li , Yufan Bao , Yecheng Zhou

Journal of Materials Informatics ›› 2025, Vol. 5 ›› Issue (1) : 4

PDF
Journal of Materials Informatics ›› 2025, Vol. 5 ›› Issue (1) :4 DOI: 10.20517/jmi.2024.61
Research Article

A new simple and efficient molecular descriptor for the fast and accurate prediction of log P

Author information +
History +
PDF

Abstract

The partition coefficient (log P) is a critical parameter that measures the balance between hydrophilicity and lipophilicity of molecules, playing a key role in molecular material design and drug development. Developing accurate, efficient, and computationally simple models for log P prediction is essential for advancing drug discovery and materials science. In this study, we introduce the optimized 3D molecular representation of structures based on electron diffraction descriptor (opt3DM) into machine learning (ML) frameworks, achieving highly accurate log P predictions. By fine-tuning key parameters, the scale factor (sL) and descriptor dimension (Ns), we identified the optimal values of sL = 0.5 and Ns = 500. Among various ML algorithms tested, automatic relevance determination (ARD) regression, Ridge regression, and Bayesian Ridge regression demonstrated superior predictive performance. These optimized models outperformed the OPEn structure-activity/property relationship app (OPERA) model on the M-dataset and also delivered competitive results in the SAMPL6 and SAMPL9 challenges. Our findings not only establish a robust, fast, and precise approach for log P prediction, but also highlight the potential of opt3DM as a powerful tool for molecular representation. This work lays a foundation for broader applications in molecular material design and drug development.

Keywords

Molecular descriptor / machine learning / partition coefficient / optimized 3D MoRSE descriptor / SAMPL6 / SAMPL9

Cite this article

Download citation ▾
Xiaojian Zeng, Xin Ye, Donghua Liu, Ningyi Cui, Xiaopeng Li, Yufan Bao, Yecheng Zhou. A new simple and efficient molecular descriptor for the fast and accurate prediction of log P. Journal of Materials Informatics, 2025, 5(1): 4 DOI:10.20517/jmi.2024.61

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Lipinski CA,Dominy BW.Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings.Adv Drug Deliv Rev1997;23:3-25

[2]

Sun J,Zou J.Accelerating the discovery of acceptor materials for organic solar cells by deep learning.npj Comput Mater2024;10:1367

[3]

Zhang R,Wang T.Equally high efficiencies of organic solar cells processed from different solvents reveal key factors for morphology control. Nat Energy. 2024.

[4]

Procacci P.SAMPL6 blind predictions of water-octanol partition coefficients using nonequilibrium alchemical approaches.J Comput Aided Mol Des2020;34:371-84

[5]

Nikitin A.Non-zero Lennard-Jones parameters for the Toukan-Rahman water model: more accurate calculations of the solvation free energy of organic substances.J Comput Aided Mol Des2020;34:437-41

[6]

Ali HS.Energy-entropy multiscale cell correlation method to predict toluene-water log P in the SAMPL9 challenge.Phys Chem Chem Phys2023;25:27524-31 PMCID:PMC11411597

[7]

Tielker N,Eberlein L,Kast SM.The SAMPL6 challenge on predicting octanol-water partition coefficients from EC-RISM theory.J Comput Aided Mol Des2020;34:453-61 PMCID:PMC7125249

[8]

Guan D,Matthews S.LogP prediction performance with the SMD solvation model and the M06 density functional family for SAMPL6 blind prediction challenge molecules.J Comput Aided Mol Des2020;34:511-22

[9]

Loschen C,Klamt A.COSMO-RS based predictions for the SAMPL6 logP challenge.J Comput Aided Mol Des2020;34:385-92

[10]

Prasad S.A deep learning approach for the blind logP prediction in SAMPL6 challenge.J Comput Aided Mol Des2020;34:535-42 PMCID:PMC8689685

[11]

Lui R,Matthews S.A comparison of molecular representations for lipophilicity quantitative structure-property relationships with results from the SAMPL6 logP Prediction Challenge.J Comput Aided Mol Des2020;34:523-34

[12]

Ulrich N,Ebert A.Exploring the octanol-water partition coefficient dataset using deep learning techniques and data augmentation.Commun Chem2021;4:90 PMCID:PMC9814212

[13]

Zamora WJ,Pinheiro S.Prediction of toluene/water partition coefficients in the SAMPL9 blind challenge: assessment of machine learning and IEF-PCM/MST continuum solvation models.Phys Chem Chem Phys2023;25:17952-65

[14]

Liu JB,Cao J.The coherence and properties analysis of balanced 2p-ary tree networks.IEEE Trans Netw Sci Eng2024;11:4719-28

[15]

Nevolianis T,Mitsos A. Multi-fidelity graph neural networks for predicting toluene/water partition coefficients. ChemRxiv 2024. Available online: https://doi.org/10.26434/chemrxiv. (accessed 9 Jan 2024).

[16]

Mansouri K,Judson RS.OPERA models for predicting physicochemical properties and environmental fate endpoints.J Cheminform2018;10:10 PMCID:PMC5843579

[17]

Schuur JH,Gasteiger J.The coding of the three-dimensional structure of molecules by molecular transforms and its application to structure-spectra correlations and studies of biological activity.J Chem Inf Comput Sci1996;36:334-44

[18]

Gasteiger J,Schuur J,Steinhauer L.Chemical information in 3D space.J Chem Inf Comput Sci1996;36:1030-7

[19]

Ma J,Lei Z.Intermolecular 3D-MoRSE descriptors for fast and accurate prediction of electronic couplings in organic semiconductors.J Chem Inf Model2023;63:5089-96

[20]

Ye X,Ou W.Explainable optimized 3D-MoRSE descriptors for the power conversion efficiency prediction of molecular passivated perovskite solar cells through machine learning.J Mater Chem A2024;12:26224-33

[21]

Mansouri K. OPERA. Available from: https://github.com/kmansouri/OPERA.git. [Last accessed on 9 Jan 2025]

[22]

Mobley DL,Dani . SAMPL9. Available from: https://github.com/samplchallenges/SAMPL9. [Last accessed on 9 Jan 2025]

[23]

Nevolianis T,Hellweg A,Leonhard K.Blind prediction of toluene/water partition coefficients using COSMO-RS: results from the SAMPL9 challenge.Phys Chem Chem Phys2023;25:31683-91

[24]

Lundberg S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. Available online: https://doi.org/10.48550/arXiv.1705.07874. (accessed 9 Jan 2025).

AI Summary AI Mindmap
PDF

61

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/