A computational toolbox for molecular property prediction based on quantum mechanics and quantitative structure-property relationship
Qilei Liu, Yinke Jiang, Lei Zhang, Jian Du
A computational toolbox for molecular property prediction based on quantum mechanics and quantitative structure-property relationship
Chemical industry is always seeking opportunities to efficiently and economically convert raw materials to commodity chemicals and higher value-added chemical-based products. The life cycles of chemical products involve the procedures of conceptual product designs, experimental investigations, sustainable manufactures through appropriate chemical processes and waste disposals. During these periods, one of the most important keys is the molecular property prediction models associating molecular structures with product properties. In this paper, a framework combining quantum mechanics and quantitative structure-property relationship is established for fast molecular property predictions, such as activity coefficient, and so forth. The workflow of framework consists of three steps. In the first step, a database is created for collections of basic molecular information; in the second step, quantum mechanics-based calculations are performed to predict quantum mechanics-based/derived molecular properties (pseudo experimental data), which are stored in a database and further provided for the developments of quantitative structure-property relationship methods for fast predictions of properties in the third step. The whole framework has been carried out within a molecular property prediction toolbox. Two case studies highlighting different aspects of the toolbox involving the predictions of heats of reaction and solid-liquid phase equilibriums are presented.
molecular property / quantum mechanics / quantitative structure-property relationship / heat of reaction / solid-liquid phase equilibrium
[1] |
Kirkpatrick P, Ellis C. Chemical space. Nature, 2004, 432(7019): 823
CrossRef
Google scholar
|
[2] |
Katritzky A R, Lobanov V S, Karelson M. QSPR: the correlation and quantitative prediction of chemical and physical properties from structure. Chemical Society Reviews, 1995, 24(4): 279–287
CrossRef
Google scholar
|
[3] |
Mills E J. On melting point and boiling point as related to composition. Philosophical Magazine, 1884, 17(5): 173–187
|
[4] |
Dearden J C, Cronin M T D, Kaiser K L E. How not to develop a quantitative structureactivity or structureproperty relationship (QSAR/QSPR). SAR and QSAR in Environmental Research, 2009, 20(3-4): 241–266
CrossRef
Google scholar
|
[5] |
Kim S, Cho K H. PyQSAR: a fast QSAR modeling platform using machine learning and jupyter notebook. Bulletin of the Korean Chemical Society, 2019, 40(1): 39–44
|
[6] |
Enciso M, Meftahi N, Walker M L, Smith B J. BioPPSy: an open-source platform for QSAR/QSPR analysis. PLoS One, 2016, 11(11): e0166298
CrossRef
Google scholar
|
[7] |
Pirhadi S, Sunseri J, Koes D R. Open source molecular modeling. Journal of Molecular Graphics & Modelling, 2016, 69: 127–143
CrossRef
Google scholar
|
[8] |
Stålring J C, Carlsson L A, Almeida P, Boyer S. AZOrange—high performance open source machine learning for QSAR modeling in a graphical programming environment. Journal of Cheminformatics, 2011, 3(1): 28
CrossRef
Google scholar
|
[9] |
Cortes-Ciriano I. Bioalerts: a python library for the derivation of structural alerts from bioactivity and toxicity data sets. Journal of Cheminformatics, 2016, 8(1): 13
CrossRef
Google scholar
|
[10] |
Murrell D S, Cortes-Ciriano I, van Westen G J P, Stott I P, Bender A, Malliavin T E, Glen R C. Chemically aware model builder (camb): an R package for property and bioactivity modelling of small molecules. Journal of Cheminformatics, 2015, 7(1): 45
CrossRef
Google scholar
|
[11] |
Carrió P, López O, Sanz F, Pastor M. eTOXlab, an open source modeling framework for implementing predictive models in production environments. Journal of Cheminformatics, 2015, 7(1): 8
CrossRef
Google scholar
|
[12] |
Tosco P, Balle T. Open3DQSAR: a new open-source software aimed at high-throughput chemometric analysis of molecular interaction fields. Journal of Molecular Modeling, 2011, 17(1): 201–208
CrossRef
Google scholar
|
[13] |
Dimitrov S D, Diderich R, Sobanski T, Pavlov T S, Chankov G V, Chapkanov A S, Karakolev Y H, Temelkov S G, Vasilev R A, Gerova K D,
CrossRef
Google scholar
|
[14] |
Kostal J. Advances in Molecular Toxicology. 1st ed. Cambridge: Elsevier, 2016, 139–186
|
[15] |
Krokhotin A, Dokholyan N V. Methods in Enzymology. 1st ed. Waltham: Elsevier, 2015, 65–89
|
[16] |
Polanski J. Comprehensive Chemometrics. 1st ed. Oxford: Elsevier, 2009, 459–506
|
[17] |
Salomon-Ferrer R, Case D A, Walker R C. An overview of the Amber biomolecular simulation package. WIREs Computational Molecular Science, 2013, 3(2): 198–210
CrossRef
Google scholar
|
[18] |
Jo S, Kim T, Iyer V G, Im W. CHARMM-GUI: a web-based graphical user interface for CHARMM. Journal of Computational Chemistry, 2008, 29(11): 1859–1865
CrossRef
Google scholar
|
[19] |
Berendsen H J C, van der Spoel D, van Drunen R. GROMACS: a message-passing parallel molecular dynamics implementation. Computer Physics Communications, 1995, 91(1): 43–56
CrossRef
Google scholar
|
[20] |
Plimpton S. Fast parallel algorithms for short-range molecular dynamics. Journal of Computational Physics, 1995, 117(1): 1–19
CrossRef
Google scholar
|
[21] |
Phillips J C, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel R D, Kalé L, Schulten K. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 2005, 26(16): 1781–1802
CrossRef
Google scholar
|
[22] |
Li W, Chen C, Zhao D, Li S. LSQC: low scaling quantum chemistry program. International Journal of Quantum Chemistry, 2015, 115(10): 641–646
CrossRef
Google scholar
|
[23] |
Gaussian 16. Revision A.03. Wallingford, CT: Gaussian, Inc., 2016.
|
[24] |
Neese F. The ORCA program system. WIREs Computational Molecular Science, 2012, 2(1): 73–78
CrossRef
Google scholar
|
[25] |
Schmidt M W, Baldridge K K, Boatz J A, Elbert S T, Gordon M S, Jensen J H, Koseki S, Matsunaga N, Nguyen K A, Su S,
CrossRef
Google scholar
|
[26] |
Stewart James J P. MOPAC: a semiempirical molecular orbital program. Journal of Computer-Aided Molecular Design, 1990, 4(1): 1–103
CrossRef
Google scholar
|
[27] |
Neese F, Wennmohs F, Hansen A, Becker U. Efficient, approximate and parallel hartreefock and hybrid DFT calculations. A ‘chain-of-spheres’ algorithm for the hartreefock exchange. Chemical Physics, 2009, 356(1): 98–109
CrossRef
Google scholar
|
[28] |
O’Boyle N M, Banck M, James C A, Morley C, Vandermeersch T, Hutchison G R. Open Babel: an open chemical toolbox. Journal of Cheminformatics, 2011, 3(1): 33
CrossRef
Google scholar
|
[29] |
Mata R A, Suhm M A. Benchmarking quantum chemical methods: are we heading in the right direction? Angewandte Chemie International Edition, 2017, 56(37): 11011–11018
CrossRef
Google scholar
|
[30] |
Vereecken L, Glowacki D R, Pilling M J. Theoretical chemical kinetics in tropospheric chemistry: methodologies and applications. Chemical Reviews, 2015, 115(10): 4063–4114
CrossRef
Google scholar
|
[31] |
Zheng J, Zhao Y, Truhlar D G. The DBH24/08 database and its use to assess electronic structure model chemistries for chemical reaction barrier heights. Journal of Chemical Theory and Computation, 2009, 5(4): 808–821
CrossRef
Google scholar
|
[32] |
Řezáč J, Hobza P. Describing noncovalent interactions beyond the common approximations: how accurate is the “gold standard,” CCSD(T) at the complete basis set limit? Journal of Chemical Theory and Computation, 2013, 9(5): 2151–2155
CrossRef
Google scholar
|
[33] |
Sun J, Furness J W, Zhang Y. Mathematical Physics in Theoretical Chemistry. 1st ed. Amsterdam: Elsevier, 2019, 119–159
|
[34] |
Goerigk L, Hansen A, Bauer C, Ehrlich S, Najibi A, Grimme S. A look at the density functional theory zoo with the advanced GMTKN55 database for general main group thermochemistry, kinetics and noncovalent interactions. Physical Chemistry Chemical Physics, 2017, 19(48): 32184–32215
CrossRef
Google scholar
|
[35] |
Politzer P, Ma Y, Lane P, Concha M C. Computational prediction of standard gas, liquid, and solid-phase heats of formation and heats of vaporization and sublimation. International Journal of Quantum Chemistry, 2005, 105(4): 341–347
CrossRef
Google scholar
|
[36] |
Speight J G. Book Lange’s Handbook of Chemistry. 16th ed. New York: McGraw-Hill, 2005, 515–560.
|
[37] |
Liu Q, Zhang L, Liu L, Du J, Meng Q, Gani R. Computer-aided reaction solvent design based on transition state theory and COSMO-SAC. Chemical Engineering Science, 2019, 202: 300–317
CrossRef
Google scholar
|
[38] |
Hsieh C M, Sandler S I, Lin S T. Improvements of COSMO-SAC for vaporliquid and liquidliquid equilibrium predictions. Fluid Phase Equilibria, 2010, 297(1): 90–97
CrossRef
Google scholar
|
[39] |
Chen W L, Hsieh C M, Yang L, Hsu C C, Lin S T. A critical evaluation on the performance of COSMO-SAC models for vaporliquid and liquidliquid equilibrium predictions based on different quantum chemical calculations. Industrial & Engineering Chemistry Research, 2016, 55(34): 9312–9322
CrossRef
Google scholar
|
[40] |
Gani R. Group contribution-based property estimation methods: advances and perspectives. Current Opinion in Chemical Engineering, 2019, 23: 184–196
CrossRef
Google scholar
|
[41] |
Mattei M, Kontogeorgis G M, Gani R. Modeling of the critical micelle concentration (CMC) of nonionic surfactants with an extended group-contribution method. Industrial & Engineering Chemistry Research, 2013, 52(34): 12236–12246
CrossRef
Google scholar
|
[42] |
Hukkerikar A S, Sarup B, Ten Kate A, Abildskov J, Sin G, Gani R. Group-contribution+ (GC+) based estimation of properties of pure components: improved property estimation and uncertainty analysis. Fluid Phase Equilibria, 2012, 321: 25–43
CrossRef
Google scholar
|
[43] |
Goh A T C. Back-propagation neural networks for modeling complex systems. Artificial Intelligence in Engineering, 1995, 9(3): 143–151
CrossRef
Google scholar
|
[44] |
Liu Q, Zhang L, Liu L, Du J, Tula A K, Eden M, Gani R. OptCAMD: an optimization-based framework and tool for molecular and mixture product design. Computers & Chemical Engineering, 2019, 124: 285–301
CrossRef
Google scholar
|
[45] |
Lu T, Chen F. Multiwfn: a multifunctional wavefunction analyzer. Journal of Computational Chemistry, 2012, 33(5): 580–592
CrossRef
Google scholar
|
[46] |
Lu T, Chen F. Quantitative analysis of molecular surface based on improved marching tetrahedra algorithm. Journal of Molecular Graphics & Modelling, 2012, 38: 314–323
CrossRef
Google scholar
|
[47] |
Oliphant T E. Python for scientific computing. Computing in Science & Engineering, 2007, 9(3): 10–20
CrossRef
Google scholar
|
[48] |
Liu Q, Zhang L, Tang K, Feng Y, Zhang J, Zhuang Y, Liu L, Du J. Computer-aided reaction solvent design considering inertness using group contribution-based reaction thermodynamic model. Chemical Engineering Research & Design, 2019, 152: 123–133
CrossRef
Google scholar
|
[49] |
Oxtoby D W, Gillis H P, Campion A, Helal H H, Gaither K P. Book Principles of Modern Chemistry. 7th ed. Belmont: CENGAGE Learning, 2011, 596
|
[50] |
Mullins E, Oldland R, Liu Y A, Wang S, Sandler S I, Chen C C, Zwolak M, Seavey K C. Sigma-profile database for using COSMO-based thermodynamic methods. Industrial & Engineering Chemistry Research, 2006, 45(12): 4389–4415
CrossRef
Google scholar
|
[51] |
Rooney J J. Trouton’s rule. Nature, 1990, 348(6300): 398–398
CrossRef
Google scholar
|
[52] |
Liu Q, Zhang L, Tang K, Liu L, Du J, Meng Q, Gani R. Machine learning-based atom contribution method for the prediction of surface charge density profiles and solvent design. AIChE Journal. American Institute of Chemical Engineers, 2021, 67(2): e17110
CrossRef
Google scholar
|
[53] |
Gastegger M, Schwiedrzik L, Bittermann M, Berzsenyi F, Marquetand P. WACSF—weighted atom-centered symmetry functions as descriptors in machine learning potentials. Journal of Chemical Physics, 2018, 148(24): 241709
CrossRef
Google scholar
|
[54] |
Wang S, Song Z, Wang J, Dong Y, Wu M. Solubilities of ibuprofen in different pure solvents. Journal of Chemical & Engineering Data, 2010, 55(11): 5283–5285
CrossRef
Google scholar
|
[55] |
Hong J, Hua D, Wang X, Wang H, Li J. Solidliquidgas equilibrium of the ternaries ibuprofen+ myristic acid+ CO2 and ibuprofen+ tripalmitin+ CO2. Journal of Chemical & Engineering Data, 2010, 55(1): 297–302
CrossRef
Google scholar
|
/
〈 | 〉 |