A comprehensive review of molecular optimization in artificial intelligence-based drug discovery

Yuhang Xia, Yongkang Wang, Zhiwei Wang, Wen Zhang

PDF(950 KB)
PDF(950 KB)
Quant. Biol. ›› 2024, Vol. 12 ›› Issue (1) : 15-29. DOI: 10.1002/qub2.30
REVIEW ARTICLE

A comprehensive review of molecular optimization in artificial intelligence-based drug discovery

Author information +
History +

Abstract

Drug discovery is aimed to design novel molecules with specific chemical properties for the treatment of targeting diseases. Generally, molecular optimization is one important step in drug discovery, which optimizes the physical and chemical properties of a molecule. Currently, artificial intelligence techniques have shown excellent success in drug discovery, which has emerged as a new strategy to address the challenges of drug design including molecular optimization, and drastically reduce the costs and time for drug discovery. We review the latest advances of molecular optimization in artificial intelligence-based drug discovery, including data resources, molecular properties, optimization methodologies, and assessment criteria for molecular optimization. Specifically, we classify the optimization methodologies into molecular mapping-based, molecular distribution matching-based, and guided search-based methods, respectively, and discuss the principles of these methods as well as their pros and cons. Moreover, we highlight the current challenges in molecular optimization and offer a variety of perspectives, including interpretability, multidimensional optimization, and model generalization, on potential new lines of research to pursue in future. This study provides a comprehensive review of molecular optimization in artificial intelligence-based drug discovery, which points out the challenges as well as the new prospects. This review will guide researchers who are interested in artificial intelligence molecular optimization.

Keywords

artificial intelligence / drug discovery / molecular optimization

Cite this article

Download citation ▾
Yuhang Xia, Yongkang Wang, Zhiwei Wang, Wen Zhang. A comprehensive review of molecular optimization in artificial intelligence-based drug discovery. Quant. Biol., 2024, 12(1): 15‒29 https://doi.org/10.1002/qub2.30

References

[1]
Lotfi Shahreza M, Ghadiri N, Mousavi SR, Varshosaz J, Green JR. A review of network-based approaches to drug repositioning. Briefings Bioinf. 2018;19(5):878–92.
CrossRef Google scholar
[2]
Shim JS, Liu JO. Recent advances in drug repositioning for the discovery of new anticancer drugs. Int J Biol Sci. 2014;10(7): 654–63.
CrossRef Google scholar
[3]
Xue H, Li J, Xie H, Wang Y. Review of drug repositioning approaches and resources. Int J Biol Sci. 2018;14(10):1232–44.
CrossRef Google scholar
[4]
Mak K.-K, Pichika MR. Artificial intelligence in drug development: present status and future prospects. Drug Discov Today. 2019;24(3):773–80.
CrossRef Google scholar
[5]
Zhang Y, Luo M, Wu P, Wu S, Lee T.-Y, Bai C. Application of computational biology and artificial intelligence in drug design. Int J Mol Sci. 2022;23(21):13568.
CrossRef Google scholar
[6]
Zhao Y, Du X, Duan Y, Pan X, Sun Y, You T, et al. High-throughput screening identifies established drugs as SARSCoV-2 PLpro inhibitors. Protein Cell. 2021;12(11):877–88.
CrossRef Google scholar
[7]
Shaker B, Ahmad S, Lee J, Jung C, Na D. In silico methods and tools for drug discovery. Comput Biol Med. 2021;137:104851.
CrossRef Google scholar
[8]
Wang M, Hsieh C.-Y, Wang J, Wang D, Weng G, Shen C, et al. RELATION: a deep generative model for structure-based de novo drug design. J Med Chem. 2022;65(13):9478–92.
CrossRef Google scholar
[9]
Li Y, Pei J, Lai L. Structure-based de novo drug design using 3D deep generative models. Chem Sci. 2021;12(41):13664–75.
CrossRef Google scholar
[10]
Winter R, Montanari F, Steffen A, Briem H, Noé F, Clevert D.-A. Efficient multi-objective molecular optimization in a continuous latent space. Chem Sci. 2019;10(34):8016–24.
CrossRef Google scholar
[11]
Yue X, Wang Z, Huang J, Parthasarathy S, Moosavinasab S, Huang Y, et al. Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics. 2020;36(4):1241–51.
CrossRef Google scholar
[12]
Zhang W, Chen Y, Liu F, Luo F, Tian G, Li X. Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinf. 2017;18(1):18.
CrossRef Google scholar
[13]
Deng Y, Xu X, Qiu Y, Xia J, Zhang W, Liu S. A multimodal deep learning framework for predicting drug–drug interaction events. Bioinformatics. 2020;36(15):4316–22.
CrossRef Google scholar
[14]
Zhang W, Zou H, Luo L, Liu Q, Wu W, Xiao W. Predicting potential side effects of drugs by recommender methods and ensemble learning. Neurocomputing. 2016;173:979–87.
CrossRef Google scholar
[15]
Fu H, Huang F, Liu X, Qiu Y, Zhang W. MVGCN: data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks. Bioinformatics. 2022;38(2):426–34.
CrossRef Google scholar
[16]
Younis MA, Tawfeek HM, Abdellatif AAH, Abdel-Aleem JA, Harashima H. Clinical translation of nanomedicines: challenges, opportunities, and keys. Adv Drug Deliv Rev. 2022;181:114083.
CrossRef Google scholar
[17]
Brüssow H. Clinical trials with antiviral drugs against COVID-19: some progress and many shattered hopes. Environ Microbiol. 2021;23(11):6364–76.
CrossRef Google scholar
[18]
Blanchard A, Bhowmik D, Gounley J, Glaser J, Akpa BS, Irle S, et al. Adaptive Language model training for molecular design. J Cheminf. 2023;15(1):59.
CrossRef Google scholar
[19]
Schneider P, Walters WP, Plowright AT, Sieroka N, Listgarten J, Goodnow RA, et al. Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov. 2020;19(5):353–64.
CrossRef Google scholar
[20]
Fu T, Xiao C, Sun J. CORE: automatic molecule optimization using copy & refine strategy. Proc AAAI Conf Artif Intell. 2020;34(01):638–45.
CrossRef Google scholar
[21]
Muhammad U, Uzairu A, Ebuka Arthur D. Review on: quantitative structure activity relationship (QSAR) modeling. J Anal Pharm Res. 2018;7(2).
CrossRef Google scholar
[22]
Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG. ZINC: a free tool to discover chemistry for biology. J Chem Inf Model. 2012;52(7):1757–68.
CrossRef Google scholar
[23]
Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F, et al. ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res. 2015;43(W1):W612–20.
CrossRef Google scholar
[24]
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2018;46(D1):D1074–82.
CrossRef Google scholar
[25]
Molinspiration cheminformatics free web services. Molinspiration Website.
[26]
RDKit: Open-source cheminformatics. RDKit online documentation (rdkit.org).
[27]
Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, et al. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminf. 2017;9(1):33.
CrossRef Google scholar
[28]
Ramakrishnan R, Dral PO, Rupp M, von Lilienfeld OA. Quantum chemistry structures and properties of 134 kilo molecules. Sci Data. 2014;1:140022.
CrossRef Google scholar
[29]
Xia M, Huang R, Witt KL, Southall N, Fostel J, Cho M.-H, et al. Compound cytotoxicity profiling using quantitative high-throughput screening. Environ Health Perspect. 2008;116(3): 284–91.
CrossRef Google scholar
[30]
Subramanian G, Ramsundar B, Pande V, Denny RA. Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches. J Chem Inf Model. 2016;56(10): 1936–49.
CrossRef Google scholar
[31]
Martins IF, Teixeira AL, Pinheiro L, Falcao AO. A bayesian approach to in silico blood-brain barrier penetration modeling. J Chem Inf Model. 2012;52(6):1686–97.
CrossRef Google scholar
[32]
Wenlock M, Tomkinson N. Experimental in vitro DMPK and physicochemical data on a set of publicly disclosed compounds; 2015.361.
[33]
Inglese J, Auld DS, Jadhav A, Johnson RL, Simeonov A, Yasgar A, et al. Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries. Proc Natl Acad Sci USA. 2006;103(31):11473–8.
CrossRef Google scholar
[34]
Calculators and Predictors. Chemaxon website.
[35]
Kujawski J, Bernard MK, Janusz A, Kuźma W. Prediction of log P: ALOGPS application in medicinal chemistry education. J Chem Educ. 2012;89(1):64–7.
CrossRef Google scholar
[36]
van de Waterbeemd H, Smith DA, Beaumont K, Walker DK. Property-based design: optimization of drug absorption and pharmacokinetics. J Med Chem. 2001;44(9):1313–33.
CrossRef Google scholar
[37]
Hefti FF. Requirements for a lead compound to become a clinical candidate. BMC Neurosci. 2008;9(S3):S7.
CrossRef Google scholar
[38]
Awale M, Hert J, Guasch L, Riniker S, Kramer C. The playbooks of medicinal chemistry design moves. J Chem Inf Model. 2021;61(2):729–42.
CrossRef Google scholar
[39]
He J, You H, Sandström E, Nittinger E, Bjerrum EJ, Tyrchan C, et al. Molecular optimization by capturing chemist’s intuition using deep neural networks. J Cheminf. 2021;13(1):26.
CrossRef Google scholar
[40]
Barshatski G, Radinsky K. Unpaired generative molecule-tomolecule translation for lead optimization. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining. Virtual event Singapore. ACM; 2021. p. 2554–64.
CrossRef Google scholar
[41]
Weininger D. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28:31–6.
CrossRef Google scholar
[42]
Krenn M, Häse F, Nigam A, Friederich P, Aspuru-Guzik A. Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach Learn Sci Technol. 2020;1(4):045024.
CrossRef Google scholar
[43]
Yang N, Wu H, Yan J, Pan X, Yuan Y, Song L. Molecule generation for drug design: a graph learning perspective; 2022. arXiv:2202.09212.
[44]
Assouel R, Ahmed M, Segler MH, Saffari A, Bengio Y. DEFactor: differentiable edge factorization-based probabilistic graph generation. arXiv:1811.09766, 2018.
[45]
Jin W, Barzilay R, Jaakkola T. Junction tree variational autoencoder for molecular graph generation. In: Proceedings of the 35th international conference on machine learning. PMLR; 2018. p. 2323–32.
[46]
Jin W, Barzilay DR, Jaakkola T. Hierarchical generation of molecular graphs using structural motifs. In: Proceedings of the 37th international conference on machine learning. PMLR; 2020. p. 4839–48.
[47]
He J, Mattsson F, Forsberg M, Bjerrum EJ, Engkvist O, Nittinger E, et al. Transformer neural network for structure constrained molecular optimization. ICML. 2021:7.
[48]
Shin B, Park S, Bak J, Ho JC. Controlled molecule generator for optimizing multiple chemical properties. In: Proceedings of the conference on health, inference, and learning. New York: Association for Computing Machinery; 2021. p. 146–53.
[49]
Maragakis P, Nisonoff H, Cole B, Shaw DE. A deep-learning view of chemical space designed to facilitate drug discovery. J Chem Inf Model. 2020;60(10):4487–96.
CrossRef Google scholar
[50]
Jin W, Yang K, Barzilay R, Jaakkola T. Learning multimodal graph-to-graph translation for molecule optimization; 2018. arXiv:1812.01070.
[51]
Yu J, Xu T, Rong Y, Huang J, He R. Structure-aware conditional variational auto-encoder for constrained molecule optimization. Pattern Recogn. 2022;126:108581.
CrossRef Google scholar
[52]
Fu T, Xiao C, Glass L, Sun J. MOLER: incorporate moleculelevel reward to enhance deep generative model for molecule optimization. IEEE Trans Knowl Data Eng. 2021:1.
[53]
Ji C, Zheng Y, Wang R, Cai Y, Wu H. Graph polish: a novel graph generation paradigm for molecular optimization. IEEE Transact Neural Networks Learn Syst. 2021:1–15.
[54]
Chen Z, Min MR, Parthasarathy S, Ning X. A deep generative model for molecule optimization via one fragment modification. Nat Mach Intell. 2021;3(12):1040–9.
CrossRef Google scholar
[55]
Griffen E, Leach AG, Robb GR, Warner DJ. Matched molecular pairs as a medicinal chemistry tool. J Med Chem. 2011;54(22): 7739–50.
CrossRef Google scholar
[56]
Dalke A, Hert J, Kramer C. Mmpdb: an open-source matched molecular pair platform for large multiproperty data sets. J Chem Inf Model. 2018;58(5):902–10.
CrossRef Google scholar
[57]
Yang K, Jin W, Swanson K, Barzilay DR, Jaakkola T. Improving molecular design by stochastic iterative target augmentation. In: Proceedings of the 37th international conference on machine learning. PMLR; 2020. 10716.
[58]
Fan Y, Xia Y, Zhu J, Wu L, Xie S, Qin T. Back translation for molecule generation. Bioinformatics. 2022;38(5):1244–51.
CrossRef Google scholar
[59]
Wang J, Zheng S, Chen J, Yang Y. Meta learning for lowresource molecular optimization. J Chem Inf Model. 2021;61(4):1627–36.
CrossRef Google scholar
[60]
Fu T, Xiao C, Glass LM, Sun J. α-MOP Molecule optimization with α-divergence. In: 2020 IEEE international conference on Bioinformatics and biomedicine (BIBM); 2020. p. 240–4.
[61]
Maziarka Ł, Pocha A, Kaczmarczyk J, Rataj K, Danel T, Warchoł M. Mol-CycleGAN: a generative model for molecular optimization. J Cheminf. 2020;12(1):2.
CrossRef Google scholar
[62]
Barshatski G, Nordon G, Radinsky K. Multi-property molecular optimization using an integrated poly-cycle architecture. In: Proceedings of the 30th ACM international conference on information & knowledge management. Virtual event: ACM; 2021. p. 3727–36.
CrossRef Google scholar
[63]
Hoffman SC, Chenthamarakshan V, Wadhawan K, Chen P.-Y, Das P. Optimizing molecules using efficient queries from property evaluations. Nat Mach Intell. 2022;4(1):21–31.
CrossRef Google scholar
[64]
Zang C, Wang F. MoFlow: an invertible flow model for generating molecular graphs. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining: ACM; 2020. p. 617–26.
[65]
Kingma DP, Dhariwal P. Glow: generative flow with invertible 1×1 convolutions. In: Advances in neural information processing systems. Curran Associates, Inc.; 2018.
[66]
Kong X, Huang W, Tan Z, Liu Y. Molecule generation by principal subgraph mining and assembling. arXiv:2106.15098.
[67]
Madhawa K, Ishiguro K, Nakago K, Abe M. GraphNVP: an invertible flow model for generating molecular graphs; 2019. arXiv:1905.11600.
[68]
Dinh L, Sohl-Dickstein J, Bengio S. Density estimation using Real NVP; 2016. arXiv:1605.08803.
[69]
Olivecrona M, Blaschke T, Engkvist O, Chen H. Molecular denovo design through deep reinforcement learning. J Cheminf. 2017;9(1):48.
CrossRef Google scholar
[70]
Richards R, Groener A. Conditional-VAE for de novo molecular generation; 2022. arXiv:2205.01592.
[71]
Yu Y, Xu T, Li J, Qiu Y, Rong Y, Gong Z, et al. A novel scalarized scaffold hopping algorithm with graph-based variational autoencoder for discovery of JAK1 inhibitors. ACS Omega. 2021;6(35):22945–54.
CrossRef Google scholar
[72]
Ma H, Bian Y, Rong Y, Huang W, Xu T, Xie W, et al. Multi-view graph neural networks for molecular property prediction; 2020. arXiv:2005.13607.
[73]
You J, Liu B, Ying Z, Pande V, Leskovec J. Graph convolutional policy network for goal-directed molecular graph generation. In: Advances in neural information processing systems. Curran Associates, Inc.; 2018.
[74]
van Otterlo M, Wiering M. Reinforcement learning and Markov decision processes. In: Wiering M, van Otterlo M, editors. Reinforcement learning: state-of-the-art. Berlin: Springer; 2012. p. 3–42.
CrossRef Google scholar
[75]
Zhou Z, Kearnes S, Li L, Zare RN, Riley P. Optimization of molecules via deep reinforcement learning. Sci Rep. 2019;9(1):10752.
CrossRef Google scholar
[76]
Hester T, Vecerik M, Pietquin O, Lanctot M, Schaul T, Piot B, et al. Deep Q-learning from demonstrations. Proc AAAI Conf Artif Intell. 2018;32(1).
CrossRef Google scholar
[77]
Shi C, Xu M, Zhu Z, Zhang W, Zhang M, Tang J. GraphAF: a flow-based autoregressive model for molecular graph generation; 2020. arXiv:2001.09382.
[78]
Luo Y, Yan K, GraphDF JS. A discrete flow model for molecular graph generation. In: Proceedings of the 38th international conference on machine learning. PMLR; 2021. p. 7192–203.
[79]
Nigam A, Friederich P, Krenn M, Aspuru-Guzik A. Augmenting genetic algorithms with deep neural networks for exploring the chemical space; 2019. arXiv:1909.11655.
[80]
Jensen JH. A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. Chem Sci. 2019;10(12):3567–72.
CrossRef Google scholar
[81]
Ahn S, Kim J, Lee H, Shin J. Guiding deep molecular optimization with genetic exploration. In: Advances in neural information processing systems. Curran Associates, Inc.; 2020. 12008.
[82]
Sun M, Xing J, Meng H, Wang H, Chen B, Zhou J. MolSearch: search-based multi-objective molecular generation and property optimization. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining. Washington: ACM; 2022. p. 4724–32.
CrossRef Google scholar
[83]
Fu T, Xiao C, Li X, Glass LM, Sun J. MIMOSA: multi-constraint molecule sampling for molecule optimization. Proc AAAI Conf Artif Intell. 2021;35(1):125–33.
CrossRef Google scholar
[84]
Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010;50(5):742–54.
CrossRef Google scholar
[85]
Shen C, Krenn M, Eppel S, Aspuru-Guzik A. Deep molecular dreaming: inverse machine learning for de-novo molecular design and interpretability with surjective representations. Mach Learn Sci Technol. 2021;2(3):03LT02.
CrossRef Google scholar
[86]
Jiang J, Shu Y, Wang J, Long M. Transferability in deep learning: a survey; 2022. arXiv:2201.05867.
[87]
Jin W, Barzilay DR, Jaakkola T. Multi-objective molecule generation using interpretable substructures. In: Proceedings of the 37th international conference on machine learning. PMLR; 2020. p. 4849–59.
[88]
Chuang KV, Gunsalus LM, Keiser MJ. Learning molecular representations for medicinal chemistry. J Med Chem. 2020;63(16): 8705–22.
CrossRef Google scholar
[89]
Jiménez-Luna J, Grisoni F, Schneider G. Drug discovery with explainable artificial intelligence. Nat Mach Intell. 2020;2(10): 573–84.
CrossRef Google scholar
[90]
Li Z, Jiang M, Wang S, Zhang S. Deep learning methods for molecular representation and property prediction. Drug Discov Today. 2022;27(12):103373.
CrossRef Google scholar
[91]
Cai C, Wang S, Xu Y, Zhang W, Tang K, Ouyang Q, et al. Transfer learning for drug discovery. J Med Chem. 2020;63(16):8683–94.
CrossRef Google scholar
[92]
Xie Y, Shi C, Zhou H, Yang Y, Zhang W, Yu Y, et al. MARS: Markov molecular sampling for multi-objective drug discovery; 2021. arXiv:2103.10432.
[93]
Jin W, Barzilay R, Jaakkola T. Multi-objective molecule generation using interpretable substructures. arXiv:2002.03244.

RIGHTS & PERMISSIONS

2024 2024 The Authors. Quantitative Biology published by John Wiley & Sons Australia, Ltd on behalf of Higher Education Press.
AI Summary AI Mindmap
PDF(950 KB)

Accesses

Citations

Detail

Sections
Recommended

/