Exploration on learning molecular docking with deep learning models

Qin Xie; Wei Ma; Jianhang Zhang; Shiliang Li; Xiaobing Deng; Youjun Xu; Weilin Zhang

doi:10.15302/J-QB-022-0321

PDF(2617 KB)

Quant. Biol. ›› 2023, Vol. 11 ›› Issue (3) : 320-331. DOI: 10.15302/J-QB-022-0321

RESEARCH ARTICLE

Exploration on learning molecular docking with deep learning models

Author information +

History +

Abstract

Background: Molecular docking-based virtual screening (VS) aims to choose ligands with potential pharmacological activities from millions or even billions of molecules. This process could significantly cut down the number of compounds that need to be experimentally tested. However, during the docking calculation, many molecules have low affinity for a particular protein target, which waste a lot of computational resources.

Methods: We implemented a fast and practical molecular screening approach called DL-DockVS (deep learning dock virtual screening) by using deep learning models (regression and classification models) to learn the outcomes of pipelined docking programs step-by-step.

Results: In this study, we showed that this approach could successfully weed out compounds with poor docking scores while keeping compounds with potentially high docking scores against 10 DUD-E protein targets. A self-built dataset of about 1.9 million molecules was used to further verify DL-DockVS, yielding good results in terms of recall rate, active compounds enrichment factor and runtime speed.

Conclusions: We comprehensively evaluate the practicality and effectiveness of DL-DockVS against 10 protein targets. Due to the improvements of runtime and maintained success rate, it would be a useful and promising approach to screen ultra-large compound libraries in the age of big data. It is also very convenient for researchers to make a well-trained model of one specific target for predicting other chemical libraries and high docking-score molecules without docking computation again.

Author summary

A deep learning-powered VS approach combined with two free docking programs are proposed and evaluated for screening an ultra-large compound library to obtain diverse potential active compounds rapidly and efficiently. We found that it is a practical and transferable strategy to significantly reduce computational cost.

Graphical abstract

Keywords

molecular docking / ultra-large virtual screening / deep learning

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Qin Xie, Wei Ma, Jianhang Zhang, Shiliang Li, Xiaobing Deng, Youjun Xu, Weilin Zhang. Exploration on learning molecular docking with deep learning models. Quant. Biol., 2023, 11(3): 320‒331 https://doi.org/10.15302/J-QB-022-0321

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Wouters,O. J., McKee,M. (2020). Estimated research and development investment needed to bring a new medicine to market, 2009-2018. JAMA, 323: 844–853 CrossRef Google scholar

[2]	Ban,F., Dalal,K., Li,H., LeBlanc,E., Rennie,P. S. (2017). Best practices of computer-aided drug discovery: lessons learned from the development of a preclinical candidate for prostate cancer with a new mechanism of action. J. Chem. Inf. Model., 57: 1018–1028 CrossRef Google scholar

[3]	Kurcinski,M., Pawel Ciemny,M., Oleniecki,T., Kuriata,A., Badaczewska-Dawid,A. E., Kolinski,A. (2019). CABS-dock standalone: a toolbox for flexible protein-peptide docking. Bioinformatics, 35: 4170–4172 CrossRef Google scholar

[4]	Kurcinski,M., Jamroz,M., Blaszczyk,M., Kolinski,A. (2015). CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site. Nucleic Acids Res., 43: W419–W424 CrossRef Google scholar

[5]	TsujiM.,. and Kagechika, H., (2017) Identifying the receptor subtype selectivity of retinoid X and retinoic acid receptors via quantum mechanics. FEBS Open Bio, 7, 391–396

[6]	Grosdidier,A., Zoete,V. (2007). EADock: docking of small molecules into protein active sites with a multiobjective evolutionary optimization. Proteins, 67: 1010–1025 CrossRef Google scholar

[7]	Campagna-Slater,V., Pottel,J., Therrien,E., Cantin,L. D. (2012). Development of a computational tool to rival experts in the prediction of sites of metabolism of xenobiotics by p450s. J. Chem. Inf. Model., 52: 2471–2483 CrossRef Google scholar

[8]	Lee,H., Heo,L., Lee,M. S. (2015). GalaxyPepDock: a protein-peptide docking tool based on interaction similarity and energy optimization. Nucleic Acids Res., 43: W431–W435 CrossRef Google scholar

[9]	ShinW. H.,Lee G. R.,HeoL.,LeeH.. (2014) Prediction of protein structure and interaction by galaxy protein modeling programs. On the website of researchgate

[10]

van Zundert,G. C. P., Rodrigues,J. P. G. L. M., Trellet,M., Schmitz,C., Kastritis,P. L., Karaca,E., Melquiond,A. S. J., van Dijk,M., de Vries,S. J. Bonvin,A. M. J. (2016). The haddock2.2 webserver: user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol., 428: 720–725

CrossRef Google scholar

[11]	Dominguez,C., Boelens,R. Bonvin,A. (2003). HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc., 125: 1731–1737 CrossRef Google scholar

[12]	Roel-Touris,J., Romero-Durana,M., Vidal,M., lez,D. (2018). LightDock: a new multi-scale approach to protein-protein docking. Bioinformatics, 34: 49–55 CrossRef Google scholar

[13]	Meier,R., Pippel,M., Brandt,F., Sippl,W. (2010). Paradocks: a framework for molecular docking with population-based metaheuristics. J. Chem. Inf. Model., 50: 879–889

[14]	Pei,J., Wang,Q., Liu,Z., Li,Q., Yang,K. (2006). PSI-DOCK: towards highly efficient and accurate flexible ligand docking. Proteins, 62: 934–946 CrossRef Google scholar

[15]	McMartin,C. Bohacek,R. (1997). QXP: powerful, rapid computer algorithms for structure-based drug design. J. Comput. Aided Mol. Des., 11: 333–344 CrossRef Google scholar

[16]

Ruiz-Carmona,S., Alvarez-Garcia,D., Foloppe,N., Garmendia-Doval,A. B., Juhos,S., Schmidtke,P., Barril,X., Hubbard,R. E. Morley,S. (2014). rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids. PLOS Comput. Biol., 10: e1003571–e1003578

CrossRef Google scholar

[17]	Morley,S. D. (2004). Validation of an empirical RNA-ligand scoring function for fast flexible docking using Ribodock. J. Comput. Aided Mol. Des., 18: 189–208 CrossRef Google scholar

[18]	Majeux,N., Apostolakis,M. S. J., Ehrhardt,C. (1999). Exhaustive docking of molecular fragments on protein binding sites with electrostatic solvation. Proteins, 37: 88–105 CrossRef Google scholar

[19]	Koes,D. R., Baumgartner,M. P. Camacho,C. (2013). Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J. Chem. Inf. Model., 53: 1893–1904 CrossRef Google scholar

[20]	Onawole,A. T., Kolapo,T. U., Sulaiman,K. O. Adegoke,R. (2018). Structure based virtual screening of the Ebola virus trimeric glycoprotein using consensus scoring. Comput. Biol. Chem., 72: 170–180 CrossRef Google scholar

[21]	Feher,M. (2006). Consensus scoring for protein-ligand interactions. Drug Discov. Today, 11: 421–428 CrossRef Google scholar

[22]	Mavrogeni,M. Pronios,F., Zareifi,D., Vasilakaki,S., Lozach,O., Alexopoulos,L., Meijer,L., Myrianthopoulos,V. (2018). A facile consensus ranking approach enhances virtual screening robustness and identifies a cell-active DYRK1α inhibitor. Future Med. Chem., 10: 2411–2430

[23]	Houston,D. R. Walkinshaw,M. (2013). Consensus docking: improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model., 53: 384–390 CrossRef Google scholar

[24]	Berenger,F., Vu,O. (2017). Consensus queries in ligand-based virtual screening experiments. J. Cheminform., 9: 60 CrossRef Google scholar

[25]	Masters,L., Eagon,S. (2020). Evaluation of consensus scoring methods for AutoDock Vina, smina and idock. J. Mol. Graph. Model., 96: 107532 CrossRef Google scholar

[26]	OnawoleA. T.,SulaimanK. O.,AdegokeR. O. KolapoT.. (2017) Identification of potential inhibitors against the Zika virus using consensus scoring. J. Mole. Graphi., 73, 54–61

[27]	WangR.. (2001) How does consensus scoring work for virtual library screening? An idealized computer experiment. J. Chem. Inf. Comput. Sci., 41, 1422–1426

[28]	Yang,J. M., Chen,Y. F., Shen,T. W., Kristal,B. S. Hsu,D. (2005). Consensus scoring criteria for improving enrichment in virtual screening. J. Chem. Inf. Model., 45: 1134–1146 CrossRef Google scholar

[29]	Clark,R. D., Strizhev,A., Leonard,J. M., Blake,J. F. Matthew,J. (2002). Consensus scoring for ligand/protein interactions. J. Mol. Graph. Model., 20: 281–295 CrossRef Google scholar

[30]	Liu,S., Fu,R., Zhou,L. H. Chen,S. (2012). Application of consensus scoring and principal component analysis for virtual screening against β-secretase (BACE-1). PLoS One, 7: e38086 CrossRef Google scholar

[31]	Paul,N. (2002). ConsDock: a new program for the consensus analysis of protein-ligand interactions. Proteins, 47: 521–533 CrossRef Google scholar

[32]	Gorgulla,C., Boeszoermenyi,A., Wang,Z. F., Fischer,P. D., Coote,P. W., Padmanabha Das,K. M., Malets,Y. S., Radchenko,D. S., Moroz,Y. S., Scott,D. A. . (2020). An open-source drug discovery platform enables ultra-large virtual screens. Nature, 580: 663–668 CrossRef Google scholar

[33]	Sterling,T. Irwin,J. (2015). Zinc 15—ligand discovery for everyone. J. Chem. Inf. Model., 55: 2324–2337 CrossRef Google scholar

[34]	Irwin,J. J. Shoichet,B. (2005). ZINC—a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model., 45: 177–182 CrossRef Google scholar

[35]	Irwin,J. J., Sterling,T., Mysinger,M. M., Bolstad,E. S. Coleman,R. (2012). ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model., 52: 1757–1768 CrossRef Google scholar

[36]	Capuccini,M., Ahmed,L., Schaal,W., Laure,E. (2017). Large-scale virtual screening on public cloud resources with Apache Spark. J. Cheminform., 9: 15 CrossRef Google scholar

[37]	Gentile,F., Agrawal,V., Hsing,M., Ton,A. T., Ban,F., Norinder,U., Gleave,M. E. (2020). Deep docking: a deep learning platform for augmentation of structure based drug discovery. ACS Cent. Sci., 6: 939–949 CrossRef Google scholar

[38]	Gentile,F., Yaacoub,J. C., Gleave,J., Fernandez,M., Ton,A. Ban,F., Stern,A. (2022). Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc., 17: 672–697 CrossRef Google scholar

[39]	Berenger,F., Kumar,A., Zhang,K. Y. J. (2021). Lean-docking: exploiting ligands’ predicted docking scores to accelerate molecular docking. J. Chem. Inf. Model., 61: 2341–2352 CrossRef Google scholar

[40]	Sadybekov,A. A., Brouillette,R. L., Marin,E., Sadybekov,A. V., Luginina,A., Gusach,A., Mishin,A., Borshchevskiy,V. . (2020). Structure-based virtual screening of ultra-large library yields potent antagonists for a lipid gpcr. Biomolecules, 10: 1634 CrossRef Google scholar

[41]	Soleimany,A., Amini,A., Goldman,S., Rus,D., Bhatia,S. (2021). Evidential deep learning for guided molecular property prediction and discovery. ACS Cent. Sci., 8: 1356–1367

[42]	Yang,Y., Yao,K., Repasky,M. P., Leswing,K., Abel,R., Shoichet,B. K. Jerome,S. (2021). Efficient exploration of chemical space with docking and deep learning. J. Chem. Theory Comput., 17: 7106–7119 CrossRef Google scholar

[43]	Graff,D. E., Shakhnovich,E. I. Coley,C. (2021). Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci. (Camb.), 12: 7866–7881 CrossRef Google scholar

[44]	Graff,D. E., Aldeghi,M., Morrone,J. A., Jordan,K. E., Pyzer-Knapp,E. O. Coley,C. (2022). Self-focusing virtual screening with active design space pruning. J. Chem. Inf. Model., 62: 3854–3862 CrossRef Google scholar

[45]	Shen,C., Ding,J., Wang,Z., Cao,D., Ding,X. (2019). From machine learning to deep learning: advances in scoring functions for protein–ligand docking. Wiley Interdiscip. Rev. Comput. Mol. Sci., 10: e1429

[46]	Li,H., Sze,K. H., Lu,G. Ballester,P. (2020). Machine-learning scoring functions for structure-based drug lead optimization. Wiley Interdiscip. Rev. Comput. Mol. Sci., 10: e1465 CrossRef Google scholar

[47]	Yang,J., Shen,C. (2020). Predicting or pretending: artificial intelligence for protein-ligand interactions lack of sufficiently large and unbiased datasets. Front. Pharmacol., 11: 69 CrossRef Google scholar

[48]	Irwin,J. J., Tang,K. G., Young,J., Dandarchuluun,C., Wong,B. R., Khurelbaatar,M., Moroz,Y. S., Mayfield,J. Sayle,R. (2020). Zinc20—a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model., 60: 6065–6073 CrossRef Google scholar

[49]	Trott,O. Olson,A. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem., 31: 455–461

[50]	Zhang,X., Wong,S. E. Lightstone,F. (2013). Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines. J. Comput. Chem., 34: 915–927 CrossRef Google scholar

[51]	Boyle,N. M., Banck,M., James,C. A., Morley,C., Vandermeersch,T. Hutchison,G. (2011). Open Babel: an open chemical toolbox. J. Cheminform., 3: 33 CrossRef Google scholar

[52]	Yang,K., Swanson,K., Jin,W., Coley,C., Eiden,P., Gao,H., Guzman-Perez,A., Hopper,T., Kelley,B., Mathea,M. . (2019). Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model., 59: 3370–3388 CrossRef Google scholar

[53]	Mysinger,M. M., Carchia,M., Irwin,J. J. Shoichet,B. (2012). Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem., 55: 6582–6594 CrossRef Google scholar

DATA AVAILABILITY

The Python codes are available at github website (OpenIIPharma/mlddm). The MOL2 and CSV files of the clustered compounds from ChemDiv are available in Additional file 2. Docking scores of Training Set1 and the following Training Set2 for each target were saved as csv files and provided in Additional file 3. The SMILES, MOL2, SDF of DUD-E compounds and PDB of receptors used for validation are provided in Additional file 4. The SMILES of 500,000 compounds randomly selected from the ChEMBL database are provided in Additional file 5. The SMILES of compounds with activities from the ChEMBL database for each target are provided in Additional file 6. All these Additional files can be downloaded at zenodo website (5665378).

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at https://doi.org/10.15302/J-QB-022-0321.

ACKNOWLEDGEMENTS

We thanked Dr. Jianfeng Pei for valuable discussions related to deep learning and computer-aided drug discovery. Part of the computation and analysis were performed on the High Performance Computing Platform of the Peking-Tsinghua Center for Life Sciences, Peking University. We also thanked Dr. Zihao Shen, Dr. Fangjin Chen, Ms. Ting Fang for their help in the support of computational resources. This work is supported by the funding from Infinite Intelligence Pharma Ltd.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Qin Xie, Wei Ma, Jianhang Zhang, Shiliang Li, Xiaobing Deng, Youjun Xu and Weilin Zhang declare that they have no conflict of interest or financial conflicts to disclose.

All procedures performed in studies involving animals were in accordance with the ethical standards of the institution or practice at which the studies were conducted, and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

OPEN ACCESS

This article is licensed by the CC By under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

RIGHTS & PERMISSIONS

2023 The Author(s). Published by Higher Education Press.

AI Summary AI Mindmap

PDF(2617 KB)

Supplementary files

QB-22321-OF-XYJ_suppl_1 (1035 KB)

1269

Accesses

Citations

Altmetric

Detail

Sections

Recommended

Abstract
Author summary
Graphical abstract
Keywords
Cite this article
References
DATA AVAILABILITY
SUPPLEMENTARY MATERIALS
ACKNOWLEDGEMENTS
COMPLIANCE WITH ETHICS GUIDELINES
OPEN ACCESS
RIGHTS & PERMISSIONS

Received	Revised	Accepted	Published
23 Nov 2022	19 Jan 2023	29 Jan 2023	15 Sep 2023
Just Accepted Date	Online First Date	Issue Date
17 Mar 2023	10 May 2023	08 Oct 2023

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs