Exploration on learning molecular docking with deep learning models
Qin Xie, Wei Ma, Jianhang Zhang, Shiliang Li, Xiaobing Deng, Youjun Xu, Weilin Zhang
Exploration on learning molecular docking with deep learning models
Background: Molecular docking-based virtual screening (VS) aims to choose ligands with potential pharmacological activities from millions or even billions of molecules. This process could significantly cut down the number of compounds that need to be experimentally tested. However, during the docking calculation, many molecules have low affinity for a particular protein target, which waste a lot of computational resources.
Methods: We implemented a fast and practical molecular screening approach called DL-DockVS (deep learning dock virtual screening) by using deep learning models (regression and classification models) to learn the outcomes of pipelined docking programs step-by-step.
Results: In this study, we showed that this approach could successfully weed out compounds with poor docking scores while keeping compounds with potentially high docking scores against 10 DUD-E protein targets. A self-built dataset of about 1.9 million molecules was used to further verify DL-DockVS, yielding good results in terms of recall rate, active compounds enrichment factor and runtime speed.
Conclusions: We comprehensively evaluate the practicality and effectiveness of DL-DockVS against 10 protein targets. Due to the improvements of runtime and maintained success rate, it would be a useful and promising approach to screen ultra-large compound libraries in the age of big data. It is also very convenient for researchers to make a well-trained model of one specific target for predicting other chemical libraries and high docking-score molecules without docking computation again.
A deep learning-powered VS approach combined with two free docking programs are proposed and evaluated for screening an ultra-large compound library to obtain diverse potential active compounds rapidly and efficiently. We found that it is a practical and transferable strategy to significantly reduce computational cost.
molecular docking / ultra-large virtual screening / deep learning
[1] |
Wouters,O. J., McKee,M. (2020). Estimated research and development investment needed to bring a new medicine to market, 2009-2018. JAMA, 323: 844–853
CrossRef
Google scholar
|
[2] |
Ban,F., Dalal,K., Li,H., LeBlanc,E., Rennie,P. S. (2017). Best practices of computer-aided drug discovery: lessons learned from the development of a preclinical candidate for prostate cancer with a new mechanism of action. J. Chem. Inf. Model., 57: 1018–1028
CrossRef
Google scholar
|
[3] |
Kurcinski,M., Pawel Ciemny,M., Oleniecki,T., Kuriata,A., Badaczewska-Dawid,A. E., Kolinski,A. (2019). CABS-dock standalone: a toolbox for flexible protein-peptide docking. Bioinformatics, 35: 4170–4172
CrossRef
Google scholar
|
[4] |
Kurcinski,M., Jamroz,M., Blaszczyk,M., Kolinski,A. (2015). CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site. Nucleic Acids Res., 43: W419–W424
CrossRef
Google scholar
|
[5] |
TsujiM.,. and Kagechika, H., (2017) Identifying the receptor subtype selectivity of retinoid X and retinoic acid receptors via quantum mechanics. FEBS Open Bio, 7, 391–396
|
[6] |
Grosdidier,A., Zoete,V. (2007). EADock: docking of small molecules into protein active sites with a multiobjective evolutionary optimization. Proteins, 67: 1010–1025
CrossRef
Google scholar
|
[7] |
Campagna-Slater,V., Pottel,J., Therrien,E., Cantin,L. D. (2012). Development of a computational tool to rival experts in the prediction of sites of metabolism of xenobiotics by p450s. J. Chem. Inf. Model., 52: 2471–2483
CrossRef
Google scholar
|
[8] |
Lee,H., Heo,L., Lee,M. S. (2015). GalaxyPepDock: a protein-peptide docking tool based on interaction similarity and energy optimization. Nucleic Acids Res., 43: W431–W435
CrossRef
Google scholar
|
[9] |
ShinW. H.,Lee G. R.,HeoL.,LeeH.. (2014) Prediction of protein structure and interaction by galaxy protein modeling programs. On the website of researchgate
|
[10] |
van Zundert,G. C. P., Rodrigues,J. P. G. L. M., Trellet,M., Schmitz,C., Kastritis,P. L., Karaca,E., Melquiond,A. S. J., van Dijk,M., de Vries,S. J. Bonvin,A. M. J. (2016). The haddock2.2 webserver: user-friendly integrative modeling of biomolecular complexes. J. Mol. Biol., 428: 720–725
CrossRef
Google scholar
|
[11] |
Dominguez,C., Boelens,R. Bonvin,A. (2003). HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. J. Am. Chem. Soc., 125: 1731–1737
CrossRef
Google scholar
|
[12] |
Roel-Touris,J., Romero-Durana,M., Vidal,M., lez,D. (2018). LightDock: a new multi-scale approach to protein-protein docking. Bioinformatics, 34: 49–55
CrossRef
Google scholar
|
[13] |
Meier,R., Pippel,M., Brandt,F., Sippl,W. (2010). Paradocks: a framework for molecular docking with population-based metaheuristics. J. Chem. Inf. Model., 50: 879–889
|
[14] |
Pei,J., Wang,Q., Liu,Z., Li,Q., Yang,K. (2006). PSI-DOCK: towards highly efficient and accurate flexible ligand docking. Proteins, 62: 934–946
CrossRef
Google scholar
|
[15] |
McMartin,C. Bohacek,R. (1997). QXP: powerful, rapid computer algorithms for structure-based drug design. J. Comput. Aided Mol. Des., 11: 333–344
CrossRef
Google scholar
|
[16] |
Ruiz-Carmona,S., Alvarez-Garcia,D., Foloppe,N., Garmendia-Doval,A. B., Juhos,S., Schmidtke,P., Barril,X., Hubbard,R. E. Morley,S. (2014). rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids. PLOS Comput. Biol., 10: e1003571–e1003578
CrossRef
Google scholar
|
[17] |
Morley,S. D. (2004). Validation of an empirical RNA-ligand scoring function for fast flexible docking using Ribodock. J. Comput. Aided Mol. Des., 18: 189–208
CrossRef
Google scholar
|
[18] |
Majeux,N., Apostolakis,M. S. J., Ehrhardt,C. (1999). Exhaustive docking of molecular fragments on protein binding sites with electrostatic solvation. Proteins, 37: 88–105
CrossRef
Google scholar
|
[19] |
Koes,D. R., Baumgartner,M. P. Camacho,C. (2013). Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J. Chem. Inf. Model., 53: 1893–1904
CrossRef
Google scholar
|
[20] |
Onawole,A. T., Kolapo,T. U., Sulaiman,K. O. Adegoke,R. (2018). Structure based virtual screening of the Ebola virus trimeric glycoprotein using consensus scoring. Comput. Biol. Chem., 72: 170–180
CrossRef
Google scholar
|
[21] |
Feher,M. (2006). Consensus scoring for protein-ligand interactions. Drug Discov. Today, 11: 421–428
CrossRef
Google scholar
|
[22] |
Mavrogeni,M. Pronios,F., Zareifi,D., Vasilakaki,S., Lozach,O., Alexopoulos,L., Meijer,L., Myrianthopoulos,V. (2018). A facile consensus ranking approach enhances virtual screening robustness and identifies a cell-active DYRK1α inhibitor. Future Med. Chem., 10: 2411–2430
|
[23] |
Houston,D. R. Walkinshaw,M. (2013). Consensus docking: improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model., 53: 384–390
CrossRef
Google scholar
|
[24] |
Berenger,F., Vu,O. (2017). Consensus queries in ligand-based virtual screening experiments. J. Cheminform., 9: 60
CrossRef
Google scholar
|
[25] |
Masters,L., Eagon,S. (2020). Evaluation of consensus scoring methods for AutoDock Vina, smina and idock. J. Mol. Graph. Model., 96: 107532
CrossRef
Google scholar
|
[26] |
OnawoleA. T.,SulaimanK. O.,AdegokeR. O. KolapoT.. (2017) Identification of potential inhibitors against the Zika virus using consensus scoring. J. Mole. Graphi., 73, 54–61
|
[27] |
WangR.. (2001) How does consensus scoring work for virtual library screening? An idealized computer experiment. J. Chem. Inf. Comput. Sci., 41, 1422–1426
|
[28] |
Yang,J. M., Chen,Y. F., Shen,T. W., Kristal,B. S. Hsu,D. (2005). Consensus scoring criteria for improving enrichment in virtual screening. J. Chem. Inf. Model., 45: 1134–1146
CrossRef
Google scholar
|
[29] |
Clark,R. D., Strizhev,A., Leonard,J. M., Blake,J. F. Matthew,J. (2002). Consensus scoring for ligand/protein interactions. J. Mol. Graph. Model., 20: 281–295
CrossRef
Google scholar
|
[30] |
Liu,S., Fu,R., Zhou,L. H. Chen,S. (2012). Application of consensus scoring and principal component analysis for virtual screening against β-secretase (BACE-1). PLoS One, 7: e38086
CrossRef
Google scholar
|
[31] |
Paul,N. (2002). ConsDock: a new program for the consensus analysis of protein-ligand interactions. Proteins, 47: 521–533
CrossRef
Google scholar
|
[32] |
Gorgulla,C., Boeszoermenyi,A., Wang,Z. F., Fischer,P. D., Coote,P. W., Padmanabha Das,K. M., Malets,Y. S., Radchenko,D. S., Moroz,Y. S., Scott,D. A.
CrossRef
Google scholar
|
[33] |
Sterling,T. Irwin,J. (2015). Zinc 15—ligand discovery for everyone. J. Chem. Inf. Model., 55: 2324–2337
CrossRef
Google scholar
|
[34] |
Irwin,J. J. Shoichet,B. (2005). ZINC—a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model., 45: 177–182
CrossRef
Google scholar
|
[35] |
Irwin,J. J., Sterling,T., Mysinger,M. M., Bolstad,E. S. Coleman,R. (2012). ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model., 52: 1757–1768
CrossRef
Google scholar
|
[36] |
Capuccini,M., Ahmed,L., Schaal,W., Laure,E. (2017). Large-scale virtual screening on public cloud resources with Apache Spark. J. Cheminform., 9: 15
CrossRef
Google scholar
|
[37] |
Gentile,F., Agrawal,V., Hsing,M., Ton,A. T., Ban,F., Norinder,U., Gleave,M. E. (2020). Deep docking: a deep learning platform for augmentation of structure based drug discovery. ACS Cent. Sci., 6: 939–949
CrossRef
Google scholar
|
[38] |
Gentile,F., Yaacoub,J. C., Gleave,J., Fernandez,M., Ton,A. Ban,F., Stern,A. (2022). Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc., 17: 672–697
CrossRef
Google scholar
|
[39] |
Berenger,F., Kumar,A., Zhang,K. Y. J. (2021). Lean-docking: exploiting ligands’ predicted docking scores to accelerate molecular docking. J. Chem. Inf. Model., 61: 2341–2352
CrossRef
Google scholar
|
[40] |
Sadybekov,A. A., Brouillette,R. L., Marin,E., Sadybekov,A. V., Luginina,A., Gusach,A., Mishin,A., Borshchevskiy,V.
CrossRef
Google scholar
|
[41] |
Soleimany,A., Amini,A., Goldman,S., Rus,D., Bhatia,S. (2021). Evidential deep learning for guided molecular property prediction and discovery. ACS Cent. Sci., 8: 1356–1367
|
[42] |
Yang,Y., Yao,K., Repasky,M. P., Leswing,K., Abel,R., Shoichet,B. K. Jerome,S. (2021). Efficient exploration of chemical space with docking and deep learning. J. Chem. Theory Comput., 17: 7106–7119
CrossRef
Google scholar
|
[43] |
Graff,D. E., Shakhnovich,E. I. Coley,C. (2021). Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci. (Camb.), 12: 7866–7881
CrossRef
Google scholar
|
[44] |
Graff,D. E., Aldeghi,M., Morrone,J. A., Jordan,K. E., Pyzer-Knapp,E. O. Coley,C. (2022). Self-focusing virtual screening with active design space pruning. J. Chem. Inf. Model., 62: 3854–3862
CrossRef
Google scholar
|
[45] |
Shen,C., Ding,J., Wang,Z., Cao,D., Ding,X. (2019). From machine learning to deep learning: advances in scoring functions for protein–ligand docking. Wiley Interdiscip. Rev. Comput. Mol. Sci., 10: e1429
|
[46] |
Li,H., Sze,K. H., Lu,G. Ballester,P. (2020). Machine-learning scoring functions for structure-based drug lead optimization. Wiley Interdiscip. Rev. Comput. Mol. Sci., 10: e1465
CrossRef
Google scholar
|
[47] |
Yang,J., Shen,C. (2020). Predicting or pretending: artificial intelligence for protein-ligand interactions lack of sufficiently large and unbiased datasets. Front. Pharmacol., 11: 69
CrossRef
Google scholar
|
[48] |
Irwin,J. J., Tang,K. G., Young,J., Dandarchuluun,C., Wong,B. R., Khurelbaatar,M., Moroz,Y. S., Mayfield,J. Sayle,R. (2020). Zinc20—a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model., 60: 6065–6073
CrossRef
Google scholar
|
[49] |
Trott,O. Olson,A. (2010). AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem., 31: 455–461
|
[50] |
Zhang,X., Wong,S. E. Lightstone,F. (2013). Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines. J. Comput. Chem., 34: 915–927
CrossRef
Google scholar
|
[51] |
Boyle,N. M., Banck,M., James,C. A., Morley,C., Vandermeersch,T. Hutchison,G. (2011). Open Babel: an open chemical toolbox. J. Cheminform., 3: 33
CrossRef
Google scholar
|
[52] |
Yang,K., Swanson,K., Jin,W., Coley,C., Eiden,P., Gao,H., Guzman-Perez,A., Hopper,T., Kelley,B., Mathea,M.
CrossRef
Google scholar
|
[53] |
Mysinger,M. M., Carchia,M., Irwin,J. J. Shoichet,B. (2012). Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. J. Med. Chem., 55: 6582–6594
CrossRef
Google scholar
|
/
〈 | 〉 |