Advanced deep learning methods for molecular property prediction

Chao Pang, Henry H. Y. Tong, Leyi Wei

PDF(599 KB)
PDF(599 KB)
Quant. Biol. ›› 2023, Vol. 11 ›› Issue (4) : 395-404. DOI: 10.1002/qub2.23
REVIEW ARTICLE

Advanced deep learning methods for molecular property prediction

Author information +
History +

Abstract

The prediction of molecular properties is a crucial task in the field of drug discovery. Computational methods that can accurately predict molecular properties can significantly accelerate the drug discovery process and reduce the cost of drug discovery. In recent years, iterative updates in computing hardware and the rise of deep learning have created a new and effective path for molecular property prediction. Deep learning methods can leverage the vast amount of data accumulated over the years in drug discovery and do not require complex feature engineering. In this review, we summarize molecular representations and commonly used datasets in molecular property prediction models and present advanced deep learning methods for molecular property prediction, including state-of-the-art deep learning networks such as graph neural networks and Transformer-based models, as well as state-of-the-art deep learning strategies such as 3D pre-train, contrastive learning, multi-task learning, transfer learning, and meta-learning. We also point out some critical issues such as lack of datasets, low information utilization, and lack of specificity for diseases.

Keywords

dataset / deep learning / molecular property prediction / molecular representations

Cite this article

Download citation ▾
Chao Pang, Henry H. Y. Tong, Leyi Wei. Advanced deep learning methods for molecular property prediction. Quant. Biol., 2023, 11(4): 395‒404 https://doi.org/10.1002/qub2.23

References

[1]
Dickson M, Gagnon JP. The cost of new drug discovery and development. Discov Med. 2004;4:172–9.
[2]
Dowden H, Munro J. Trends in clinical success rates and therapeutic focus. Nat Rev Drug Discov. 2019;18(7):495–6.
CrossRef Google scholar
[3]
Pang C, Wang Y, Jiang Y, Wang R, Su R, Wei L. Multi-view deep learning based molecule design and structural optimization accelerates the SARS-COV-2 inhibitor discovery. arXiv preprint arXiv:221201575. 2022.
[4]
Wang Y, Pang C, Wang Y, Jiang Y, Jin J, Liang S, et al. Mechretro is a chemical-mechanism-driven graph learning framework for interpretable retrosynthesis prediction and pathway planning. arXiv preprint arXiv:221002630. 2022.
[5]
Wang R, Jin J, Zou Q, Nakai K, Wei L. Predicting protein–peptide binding residues via interpretable deep learning. Bioinformatics. 2022;38(13):3351–60.
CrossRef Google scholar
[6]
Rogers D, Hahn M, modeling. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–54.
CrossRef Google scholar
[7]
Weininger D. Smiles, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28:31–6.
CrossRef Google scholar
[8]
Heller S, McNaught A, Stein S, Tchekhovskoi D, Pletnev I. Inchi-the worldwide chemical structure identifier standard. J Cheminf. 2013;5:1–9.
CrossRef Google scholar
[9]
Kearnes S, McCloskey K, Berndl M, Pande V, Riley P. Molecular graph convolutions: moving beyond fingerprints. J Comput Aided Mol Des. 2016;30(8):595–608.
CrossRef Google scholar
[10]
Sidorova J, Anisimova M. Nlp-inspired structural pattern recognition in chemical application. Pattern Recogn Lett. 2014;45:11–6.
CrossRef Google scholar
[11]
Bjerrum EJ. Smiles enumeration as data augmentation for neural network modeling of molecules. arXiv preprint arXiv:07076. 2017.
[12]
Krenn M, Häse F, Nigam A, Friederich P, Aspuru-Guzik A, Technology. Self-referencing embedded strings (selfies): a 100% robust molecular string representation. Mach Learn: Sci Technol. 2020;1(4):045024.
CrossRef Google scholar
[13]
Wigh DS, Goodman JM, Lapkin AA. A review of molecular representation in the age of machine learning. Wiley Interdiscip Rev Comput Mol Sci. 2022;12(5):e1603.
CrossRef Google scholar
[14]
Jin W, Barzilay DR, Jaakkola T. Hierarchical generation of molecular graphs using structural motifs. In: Proceedings of the 37th international conference on machine learning, 119; 2020. p. 4839–48.
[15]
Kajino H. Molecular hypergraph grammar with its application to molecular optimization. In: International conference on machine learning. PMLR; 2019. p. 3183–91.
[16]
Read RC, Corneil DG. The graph isomorphism disease. J Graph Theor. 1977;1(4):339–63.
CrossRef Google scholar
[17]
Subramanian G, Ramsundar B, Pande V, Denny RA. Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches. J Chem Inf Model. 2016;56(10): 1936–49.
CrossRef Google scholar
[18]
Rohrer SG, Baumann K. Maximum unbiased validation (MUV) data sets for virtual screening based on pubchem bioactivity data. J Chem Inf Model. 2009;49(2):169–84.
CrossRef Google scholar
[19]
Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, et al. Moleculenet: a benchmark for molecular machine learning. Chem Sci. 2018;9(2):513–30.
CrossRef Google scholar
[20]
Martins IF, Teixeira AL, Pinheiro L, Falcao AO. A bayesian approach to in silico blood-brain barrier penetration modeling. J Chem Inf Model. 2012;52(6):1686–97.
CrossRef Google scholar
[21]
Wale N, Watson IA, Karypis G. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl Inf Syst. 2008;14(3):347–75.
CrossRef Google scholar
[22]
Kuhn M, Letunic I, Jensen LJ, Bork P. The sider database of drugs and side effects. Nucleic Acids Res. 2016;44(D1):D1075–9.
CrossRef Google scholar
[23]
Tox21 data challenge 2014, available from the website of Tripod.
[24]
Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, et al. Toxcast chemical landscape: paving the road to 21st century toxicology. Chem Res Toxicol. 2016;29(8):1225–51.
CrossRef Google scholar
[25]
Gayvert KM, Madhukar NS, Elemento O. A data-driven approach to predicting successes and failures of clinical trials. Cell Chem Biol. 2016;23(10):1294–301.
CrossRef Google scholar
[26]
Blum LC, Reymond J-L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc. 2009;131(25):8732–3.
CrossRef Google scholar
[27]
Ramakrishnan R, Hartmann M, Tapavicza E, Von Lilienfeld OA. Electronic spectra from TDDFT and machine learning in chemical space. J Chem Phys. 2015;143(8):084111.
CrossRef Google scholar
[28]
Ramakrishnan R, Dral PO, Rupp M, Von Lilienfeld OA. Quantum chemistry structures and properties of 134 kilo molecules. Sci Data. 2014;1:1–7.
CrossRef Google scholar
[29]
Pardridge WM. Introduction to the blood-brain barrier: methodology, biology and pathology. Cambridge University Press; 1998.
[30]
Montavon G, Rupp M, Gobre V, Vazquez-Mayagoitia A, Hansen K, Tkatchenko A, et al. Machine learning of molecular electronic properties in chemical compound space. New J Phys. 2013;15(9):095003.
CrossRef Google scholar
[31]
Lusci A, Pollastri G, Baldi P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model. 2013;53(7):1563–75.
CrossRef Google scholar
[32]
Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, et al. Convolutional networks on graphs for learning molecular fingerprints. Adv Neural Inf Process Syst. 2015;28.
[33]
Coley CW, Barzilay R, Green WH, Jaakkola TS, Jensen KF. Convolutional embedding of attributed molecular graphs for physical property prediction. J Chem Inf Model. 2017;57(8): 1757–72.
CrossRef Google scholar
[34]
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. Neural message passing for quantum chemistry. In: International conference on machine learning. PMLR; 2017. p. 1263–72.
[35]
Wang X, Li Z, Jiang M, Wang S, Zhang S, Wei Z. Molecule property prediction based on spatial graph embedding. J Chem Inf Model. 2019;59(9):3817–28.
CrossRef Google scholar
[36]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Advances in neural information processing systems; 2017. p. 5998–6008.
[37]
Ying C, Cai T, Luo S, Zheng S, Ke G, He D, et al. Do transformers really perform badly for graph representation? Adv Neural Inf Process Syst. 2021;34:28877–88.
[38]
Li P, Wang J, Qiao Y, Chen H, Yu Y, Yao X, et al. An effective self-supervised framework for learning expressive molecular global representations to drug discovery. Briefings Bioinf. 2021; 22(6):bbab109.
CrossRef Google scholar
[39]
Zhu W, Li Z, Cai L, Song G. Stepping back to smiles transformers for fast molecular representation inference. arXiv preprint arXiv:211213305. 2021.
[40]
Chen D, O’Bray L, Borgwardt K. Structure-aware transformer for graph representation learning. In: International conference on machine learning. PMLR; 2022. p. 3469–89.
[41]
Li H, Zhao D, Zeng J. Kpgt: knowledge-guided pre-training of graph transformer for molecular property prediction. arXiv preprint arXiv:220603364. 2022.
[42]
Liu S, Wang H, Liu W, Lasenby J, Guo H, Tang J. Pre-training molecular graph representation with 3D geometry. In: ICLR 2022 workshop on geometrical and topological representation learning; 2022.
[43]
Gao X, Gao W, Xiao W, Wang Z, Wang C, Xiang L. Supervised pretraining for molecular force fields and properties prediction. arXiv preprint arXiv:221114429. 2022.
[44]
Jiao R, Han J, Huang W, Rong Y, Liu Y. Energy-motivated equivariant pretraining for 3d molecular graphs. arXiv preprint arXiv:220708824. 2022.
[45]
Fang X, Liu L, Lei J, He D, Zhang S, Zhou J, et al. Geometryenhanced molecular representation learning for property prediction. Nat Mach Intell. 2022;4(2):127–34.
CrossRef Google scholar
[46]
Mullard A. The drug-maker’s guide to the galaxy. Nature. 2017;549(7673):445–7.
CrossRef Google scholar
[47]
Wang Y, Wang J, Cao Z, Barati Farimani A. Molecular contrastive learning of representations via graph neural networks. Nat Mach Intell. 2022;4(3):279–87.
CrossRef Google scholar
[48]
Fang Y, Zhang Q, Yang H, Zhuang X, Deng S, Zhang W, et al. Molecular contrastive learning with chemical element knowledge graph. In: Proceedings of the AAAI conference on artificial intelligence, 36; 2022. p. 3968–76.
CrossRef Google scholar
[49]
Sun M, Xing J, Wang H, Chen B, Zhou J. Mocl: contrastive learning on molecular graphs with multi-level domain knowledge. arXiv preprint arXiv:210604509. 2021.
[50]
Yin Y, Wang Q, Huang S, Xiong H, Zhang X. Autogcl: automated graph contrastive learning via learnable view generators. In: Proceedings of the AAAI conference on artificial intelligence, 36; 2022. p. 8892–900.
CrossRef Google scholar
[51]
Li S, Zhou J, Xu T, Dou D, Xiong H. Geomgcl: geometric graph contrastive learning for molecular property prediction. In: Proceedings of the AAAI conference on artificial intelligence, 36; 2022. p. 4541–9.
CrossRef Google scholar
[52]
Stärk H, Beaini D, Corso G, Tossou P, Dallago C, Günnemann S, et al. 3D infomax improves gnns for molecular property prediction. In: Proceedings of the 39th international conference on machine learning, 162; 2022. p. 20479–502.
[53]
Capela F, Nouchi V, Van Deursen R, Tetko IV, Godin G. Multitask learning on graph neural networks applied to molecular property predictions. arXiv preprint arXiv:191013124. 2019.
[54]
Zubatyuk R, Smith JS, Leszczynski J, Isayev O. Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network. Sci Adv. 2019;5(8):eaav6490.
CrossRef Google scholar
[55]
Liu S, Qu M, Zhang Z, Cai H, Tang J. Multi-task learning with domain knowledge for molecular property prediction. In: NeurIPS 2021 AI for science workshop; 2021.
[56]
Lentelink NJ, Palkovits S. Transfer learning as tool to enhance predictions of molecular properties based on 2D projections. Adv Theory Simul. 2020;3(10):2000148.
CrossRef Google scholar
[57]
Ruddigkeit L, Van Deursen R, Blum LC, Reymond J-L, modeling. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model. 2012;52(11):2864–75.
CrossRef Google scholar
[58]
Delaney JS. Esol: estimating aqueous solubility directly from molecular structure. J ChemInf Comput Sci. 2004;44(3):1000–5.
CrossRef Google scholar
[59]
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 4700–8.
[60]
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee; 2009. p. 248–55.
[61]
Chen Y-K, Shave S, Auer M. Mrlogp: transfer learning enables accurate logp prediction using small experimental training datasets. Processes. 2021;9(11):2029.
CrossRef Google scholar
[62]
Li H, Zhao X, Li S, Wan F, Zhao D, Zeng J. Improving molecular property prediction through a task similarity enhanced transfer learning strategy. iScience. 2022;25(10):105231.
CrossRef Google scholar
[63]
Nguyen CQ, Kreatsoulas C, Branson KM. Meta-learning gnn initializations for low-resource molecular property prediction. arXiv preprint arXiv:200305996. 2020.
[64]
Mayr A, Klambauer G, Unterthiner T, Steijaert M, Wegner JK, Ceulemans H, et al. Large-scale comparison of machine learning methods for drug target prediction on chembl. Chem Sci. 2018;9(24):5441–51.
CrossRef Google scholar
[65]
Guo Z, Zhang C, Yu W, Herr J, Wiest O, Jiang M, et al. Fewshot graph learning for molecular property prediction. In: Proceedings of the web conference 2021; 2021. p. 2559–67.
[66]
Wang Y, Abuduweili A, Yao Q, Dou D. Property-aware relation networks for few-shot molecular property prediction. Adv Neural Inf Process Syst. 2021;34:17441–54.
[67]
Fu H, Chen H, Blazhynska M, Goulard Coderc de Lacam E, Szczepaniak F, Pavlova A, et al. Accurate determination of protein: ligand standard binding free energies from molecular dynamics simulations. Nat Protoc. 2022;17(4):1114–41.
CrossRef Google scholar

RIGHTS & PERMISSIONS

2023 2023 The Authors. Quantitative Biology published by John Wiley & Sons Australia, Ltd on behalf of Higher Education Press.
AI Summary AI Mindmap
PDF(599 KB)

Accesses

Citations

Detail

Sections
Recommended

/