Exploration on learning molecular docking with deep learning models

Qin Xie, Wei Ma, Jianhang Zhang, Shiliang Li, Xiaobing Deng, Youjun Xu, Weilin Zhang

PDF(2617 KB)
PDF(2617 KB)
Quant. Biol. ›› 2023, Vol. 11 ›› Issue (3) : 320-331. DOI: 10.15302/J-QB-022-0321
RESEARCH ARTICLE
RESEARCH ARTICLE

Exploration on learning molecular docking with deep learning models

Author information +
History +

Abstract

Background: Molecular docking-based virtual screening (VS) aims to choose ligands with potential pharmacological activities from millions or even billions of molecules. This process could significantly cut down the number of compounds that need to be experimentally tested. However, during the docking calculation, many molecules have low affinity for a particular protein target, which waste a lot of computational resources.

Methods: We implemented a fast and practical molecular screening approach called DL-DockVS (deep learning dock virtual screening) by using deep learning models (regression and classification models) to learn the outcomes of pipelined docking programs step-by-step.

Results: In this study, we showed that this approach could successfully weed out compounds with poor docking scores while keeping compounds with potentially high docking scores against 10 DUD-E protein targets. A self-built dataset of about 1.9 million molecules was used to further verify DL-DockVS, yielding good results in terms of recall rate, active compounds enrichment factor and runtime speed.

Conclusions: We comprehensively evaluate the practicality and effectiveness of DL-DockVS against 10 protein targets. Due to the improvements of runtime and maintained success rate, it would be a useful and promising approach to screen ultra-large compound libraries in the age of big data. It is also very convenient for researchers to make a well-trained model of one specific target for predicting other chemical libraries and high docking-score molecules without docking computation again.

Author summary

A deep learning-powered VS approach combined with two free docking programs are proposed and evaluated for screening an ultra-large compound library to obtain diverse potential active compounds rapidly and efficiently. We found that it is a practical and transferable strategy to significantly reduce computational cost.

Graphical abstract

Keywords

molecular docking / ultra-large virtual screening / deep learning

Cite this article

Download citation ▾
Qin Xie, Wei Ma, Jianhang Zhang, Shiliang Li, Xiaobing Deng, Youjun Xu, Weilin Zhang. Exploration on learning molecular docking with deep learning models. Quant. Biol., 2023, 11(3): 320‒331 https://doi.org/10.15302/J-QB-022-0321

References

[1]
Wouters,O. J., McKee,M. (2020). Estimated research and development investment needed to bring a new medicine to market, 2009-2018. JAMA, 323: 844–853
CrossRef Google scholar
[2]
Ban,F., Dalal,K., Li,H., LeBlanc,E., Rennie,P. S. (2017). Best practices of computer-aided drug discovery: lessons learned from the development of a preclinical candidate for prostate cancer with a new mechanism of action. J. Chem. Inf. Model., 57: 1018–1028
CrossRef Google scholar
[3]
Kurcinski,M., Pawel Ciemny,M., Oleniecki,T., Kuriata,A., Badaczewska-Dawid,A. E., Kolinski,A. (2019). CABS-dock standalone: a toolbox for flexible protein-peptide docking. Bioinformatics, 35: 4170–4172
CrossRef Google scholar
[4]
Kurcinski,M., Jamroz,M., Blaszczyk,M., Kolinski,A. (2015). CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site. Nucleic Acids Res., 43: W419–W424
CrossRef Google scholar
[5]
TsujiM.,. and Kagechika, H., (2017) Identifying the receptor subtype selectivity of retinoid X and retinoic acid receptors via quantum mechanics. FEBS Open Bio, 7, 391–396
[6]
Grosdidier,A., Zoete,V. (2007). EADock: docking of small molecules into protein active sites with a multiobjective evolutionary optimization. Proteins, 67: 1010–1025
CrossRef Google scholar
[7]
Campagna-Slater,V., Pottel,J., Therrien,E., Cantin,L. D. (2012). Development of a computational tool to rival experts in the prediction of sites of metabolism of xenobiotics by p450s. J. Chem. Inf. Model., 52: 2471–2483
CrossRef Google scholar
[8]
Lee,H., Heo,L., Lee,M. S. (2015). GalaxyPepDock: a protein-peptide docking tool based on interaction similarity and energy optimization. Nucleic Acids Res., 43: W431–W435
CrossRef Google scholar
[9]
ShinW. H.,Lee G. R.,HeoL.,LeeH.. (2014) Prediction of protein structure and interaction by galaxy protein modeling programs. On the website of researchgate
[10]
van Zundert,G. C. P., Rodrigues,J. P. G. L. M., Trellet,M., Schmitz,C., Kastritis,P. L., Karaca,E., Melquiond,A. S. J., van Dijk,M., de Vries,S. J. Bonvin,A. M. J. (2016). The haddock2.2 webserver: user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol., 428: 720–725
CrossRef Google scholar
[11]
Dominguez,C., Boelens,R. Bonvin,A. (2003). HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc., 125: 1731–1737
CrossRef Google scholar
[12]
Roel-Touris,J., Romero-Durana,M., Vidal,M., lez,D. (2018). LightDock: a new multi-scale approach to protein-protein docking. Bioinformatics, 34: 49–55
CrossRef Google scholar
[13]
Meier,R., Pippel,M., Brandt,F., Sippl,W. (2010). Paradocks: a framework for molecular docking with population-based metaheuristics. J. Chem. Inf. Model., 50: 879–889
[14]
Pei,J., Wang,Q., Liu,Z., Li,Q., Yang,K. (2006). PSI-DOCK: towards highly efficient and accurate flexible ligand docking. Proteins, 62: 934–946
CrossRef Google scholar
[15]
McMartin,C. Bohacek,R. (1997). QXP: powerful, rapid computer algorithms for structure-based drug design. J. Comput. Aided Mol. Des., 11: 333–344
CrossRef Google scholar
[16]
Ruiz-Carmona,S., Alvarez-Garcia,D., Foloppe,N., Garmendia-Doval,A. B., Juhos,S., Schmidtke,P., Barril,X., Hubbard,R. E. Morley,S. (2014). rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids. PLOS Comput. Biol., 10: e1003571–e1003578
CrossRef Google scholar
[17]
Morley,S. D. (2004). Validation of an empirical RNA-ligand scoring function for fast flexible docking using Ribodock. J. Comput. Aided Mol. Des., 18: 189–208
CrossRef Google scholar
[18]
Majeux,N., Apostolakis,M. S. J., Ehrhardt,C. (1999). Exhaustive docking of molecular fragments on protein binding sites with electrostatic solvation. Proteins, 37: 88–105
CrossRef Google scholar
[19]
Koes,D. R., Baumgartner,M. P. Camacho,C. (2013). Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J. Chem. Inf. Model., 53: 1893–1904
CrossRef Google scholar
[20]
Onawole,A. T., Kolapo,T. U., Sulaiman,K. O. Adegoke,R. (2018). Structure based virtual screening of the Ebola virus trimeric glycoprotein using consensus scoring. Comput. Biol. Chem., 72: 170–180
CrossRef Google scholar
[21]
Feher,M. (2006). Consensus scoring for protein-ligand interactions. Drug Discov. Today, 11: 421–428
CrossRef Google scholar
[22]
Mavrogeni,M. Pronios,F., Zareifi,D., Vasilakaki,S., Lozach,O., Alexopoulos,L., Meijer,L., Myrianthopoulos,V. (2018). A facile consensus ranking approach enhances virtual screening robustness and identifies a cell-active DYRK1α inhibitor. Future Med. Chem., 10: 2411–2430
[23]
Houston,D. R. Walkinshaw,M. (2013). Consensus docking: improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model., 53: 384–390
CrossRef Google scholar
[24]
Berenger,F., Vu,O. (2017). Consensus queries in ligand-based virtual screening experiments. J. Cheminform., 9: 60
CrossRef Google scholar
[25]
Masters,L., Eagon,S. (2020). Evaluation of consensus scoring methods for AutoDock Vina, smina and idock. J. Mol. Graph. Model., 96: 107532
CrossRef Google scholar
[26]
OnawoleA. T.,SulaimanK. O.,AdegokeR. O. KolapoT.. (2017) Identification of potential inhibitors against the Zika virus using consensus scoring. J. Mole. Graphi., 73, 54–61
[27]
WangR.. (2001) How does consensus scoring work for virtual library screening? An idealized computer experiment. J. Chem. Inf. Comput. Sci., 41, 1422–1426
[28]
Yang,J. M., Chen,Y. F., Shen,T. W., Kristal,B. S. Hsu,D. (2005). Consensus scoring criteria for improving enrichment in virtual screening. J. Chem. Inf. Model., 45: 1134–1146
CrossRef Google scholar
[29]
Clark,R. D., Strizhev,A., Leonard,J. M., Blake,J. F. Matthew,J. (2002). Consensus scoring for ligand/protein interactions. J. Mol. Graph. Model., 20: 281–295
CrossRef Google scholar
[30]
Liu,S., Fu,R., Zhou,L. H. Chen,S. (2012). Application of consensus scoring and principal component analysis for virtual screening against β-secretase (BACE-1). PLoS One, 7: e38086
CrossRef Google scholar
[31]
Paul,N. (2002). ConsDock: a new program for the consensus analysis of protein-ligand interactions. Proteins, 47: 521–533
CrossRef Google scholar
[32]
Gorgulla,C., Boeszoermenyi,A., Wang,Z. F., Fischer,P. D., Coote,P. W., Padmanabha Das,K. M., Malets,Y. S., Radchenko,D. S., Moroz,Y. S., Scott,D. A. . (2020). An open-source drug discovery platform enables ultra-large virtual screens. Nature, 580: 663–668
CrossRef Google scholar
[33]
Sterling,T. Irwin,J. (2015). Zinc 15—ligand discovery for everyone. J. Chem. Inf. Model., 55: 2324–2337
CrossRef Google scholar
[34]
Irwin,J. J. Shoichet,B. (2005). ZINC—a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model., 45: 177–182
CrossRef Google scholar
[35]
Irwin,J. J., Sterling,T., Mysinger,M. M., Bolstad,E. S. Coleman,R. (2012). ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model., 52: 1757–1768
CrossRef Google scholar
[36]
Capuccini,M., Ahmed,L., Schaal,W., Laure,E. (2017). Large-scale virtual screening on public cloud resources with Apache Spark. J. Cheminform., 9: 15
CrossRef Google scholar
[37]
Gentile,F., Agrawal,V., Hsing,M., Ton,A. T., Ban,F., Norinder,U., Gleave,M. E. (2020). Deep docking: a deep learning platform for augmentation of structure based drug discovery. ACS Cent. Sci., 6: 939–949
CrossRef Google scholar
[38]
Gentile,F., Yaacoub,J. C., Gleave,J., Fernandez,M., Ton,A. Ban,F., Stern,A. (2022). Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc., 17: 672–697
CrossRef Google scholar
[39]
Berenger,F., Kumar,A., Zhang,K. Y. J. (2021). Lean-docking: exploiting ligands’ predicted docking scores to accelerate molecular docking. J. Chem. Inf. Model., 61: 2341–2352
CrossRef Google scholar
[40]
Sadybekov,A. A., Brouillette,R. L., Marin,E., Sadybekov,A. V., Luginina,A., Gusach,A., Mishin,A., Borshchevskiy,V. . (2020). Structure-based virtual screening of ultra-large library yields potent antagonists for a lipid gpcr. Biomolecules, 10: 1634
CrossRef Google scholar
[41]
Soleimany,A., Amini,A., Goldman,S., Rus,D., Bhatia,S. (2021). Evidential deep learning for guided molecular property prediction and discovery. ACS Cent. Sci., 8: 1356–1367
[42]
Yang,Y., Yao,K., Repasky,M. P., Leswing,K., Abel,R., Shoichet,B. K. Jerome,S. (2021). Efficient exploration of chemical space with docking and deep learning. J. Chem. Theory Comput., 17: 7106–7119
CrossRef Google scholar
[43]
Graff,D. E., Shakhnovich,E. I. Coley,C. (2021). Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci. (Camb.), 12: 7866–7881
CrossRef Google scholar
[44]
Graff,D. E., Aldeghi,M., Morrone,J. A., Jordan,K. E., Pyzer-Knapp,E. O. Coley,C. (2022). Self-focusing virtual screening with active design space pruning. J. Chem. Inf. Model., 62: 3854–3862
CrossRef Google scholar
[45]
Shen,C., Ding,J., Wang,Z., Cao,D., Ding,X. (2019). From machine learning to deep learning: advances in scoring functions for protein–ligand docking. Wiley Interdiscip. Rev. Comput. Mol. Sci., 10: e1429
[46]
Li,H., Sze,K. H., Lu,G. Ballester,P. (2020). Machine-learning scoring functions for structure-based drug lead optimization. Wiley Interdiscip. Rev. Comput. Mol. Sci., 10: e1465
CrossRef Google scholar
[47]
Yang,J., Shen,C. (2020). Predicting or pretending: artificial intelligence for protein-ligand interactions lack of sufficiently large and unbiased datasets. Front. Pharmacol., 11: 69
CrossRef Google scholar
[48]
Irwin,J. J., Tang,K. G., Young,J., Dandarchuluun,C., Wong,B. R., Khurelbaatar,M., Moroz,Y. S., Mayfield,J. Sayle,R. (2020). Zinc20—a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model., 60: 6065–6073
CrossRef Google scholar
[49]
Trott,O. Olson,A. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem., 31: 455–461
[50]
Zhang,X., Wong,S. E. Lightstone,F. (2013). Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines. J. Comput. Chem., 34: 915–927
CrossRef Google scholar
[51]
Boyle,N. M., Banck,M., James,C. A., Morley,C., Vandermeersch,T. Hutchison,G. (2011). Open Babel: an open chemical toolbox. J. Cheminform., 3: 33
CrossRef Google scholar
[52]
Yang,K., Swanson,K., Jin,W., Coley,C., Eiden,P., Gao,H., Guzman-Perez,A., Hopper,T., Kelley,B., Mathea,M. . (2019). Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model., 59: 3370–3388
CrossRef Google scholar
[53]
Mysinger,M. M., Carchia,M., Irwin,J. J. Shoichet,B. (2012). Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem., 55: 6582–6594
CrossRef Google scholar

DATA AVAILABILITY

The Python codes are available at github website (OpenIIPharma/mlddm). The MOL2 and CSV files of the clustered compounds from ChemDiv are available in Additional file 2. Docking scores of Training Set1 and the following Training Set2 for each target were saved as csv files and provided in Additional file 3. The SMILES, MOL2, SDF of DUD-E compounds and PDB of receptors used for validation are provided in Additional file 4. The SMILES of 500,000 compounds randomly selected from the ChEMBL database are provided in Additional file 5. The SMILES of compounds with activities from the ChEMBL database for each target are provided in Additional file 6. All these Additional files can be downloaded at zenodo website (5665378).

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at https://doi.org/10.15302/J-QB-022-0321.

ACKNOWLEDGEMENTS

We thanked Dr. Jianfeng Pei for valuable discussions related to deep learning and computer-aided drug discovery. Part of the computation and analysis were performed on the High Performance Computing Platform of the Peking-Tsinghua Center for Life Sciences, Peking University. We also thanked Dr. Zihao Shen, Dr. Fangjin Chen, Ms. Ting Fang for their help in the support of computational resources. This work is supported by the funding from Infinite Intelligence Pharma Ltd.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Qin Xie, Wei Ma, Jianhang Zhang, Shiliang Li, Xiaobing Deng, Youjun Xu and Weilin Zhang declare that they have no conflict of interest or financial conflicts to disclose.
All procedures performed in studies involving animals were in accordance with the ethical standards of the institution or practice at which the studies were conducted, and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

OPEN ACCESS

This article is licensed by the CC By under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

RIGHTS & PERMISSIONS

2023 The Author(s). Published by Higher Education Press.
AI Summary AI Mindmap
PDF(2617 KB)

Accesses

Citations

Detail

Sections
Recommended

/