Analysis of protein features and machine learning algorithms for prediction of druggable proteins

Tanlin Sun, Luhua Lai, Jianfeng Pei

PDF(1196 KB)
PDF(1196 KB)
Quant. Biol. ›› 2018, Vol. 6 ›› Issue (4) : 334-343. DOI: 10.1007/s40484-018-0157-2
RESEARCH ARTICLE
RESEARCH ARTICLE

Analysis of protein features and machine learning algorithms for prediction of druggable proteins

Author information +
History +

Abstract

Background: Computational tools have been widely used in drug discovery process since they reduce the time and cost. Prediction of whether a protein is druggable is fundamental and crucial for drug research pipeline. Sequence based protein function prediction plays vital roles in many research areas. Training data, protein features selection and machine learning algorithms are three indispensable elements that drive the successfulness of the models.

Methods: In this study, we tested the performance of different combinations of protein features and machine learning algorithms, based on FDA-approved small molecules’ targets, in druggable proteins prediction. We also enlarged the dataset to include the targets of small molecules that were in experiment or clinical investigation.

Results: We found that although the 146-d vector used by Li et al. with neuron network achieved the best training accuracy of 91.10%, overlapped 3-gram word2vec with logistic regression achieved best prediction accuracy on independent test set (89.55%) and on newly approved-targets. Enlarged dataset with targets of small molecules in experiment and clinical investigation were trained. Unfortunately, the best training accuracy was only 75.48%. In addition, we applied our models to predict potential targets for references in future study.

Conclusions: Our study indicates the potential ability of word2vec in the prediction of druggable protein. And the training dataset of druggable protein should not be extended to targets that are lack of verification. The target prediction package could be found on https://github.com/pkumdl/target_prediction.

Graphical abstract

Keywords

druggable protein / drug target / word2vec / deep learning

Cite this article

Download citation ▾
Tanlin Sun, Luhua Lai, Jianfeng Pei. Analysis of protein features and machine learning algorithms for prediction of druggable proteins. Quant. Biol., 2018, 6(4): 334‒343 https://doi.org/10.1007/s40484-018-0157-2

References

[1]
The UniProt Consortium. (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res., 45, D158–D169
CrossRef Pubmed Google scholar
[2]
Wishart, D. S., Feunang, Y. D., Guo, A. C., Lo, E. J., Marcu, A., Grant, J. R., Sajed, T., Johnson, D., Li, C., Sayeeda, Z., (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res., 46, D1074–D1082
CrossRef Pubmed Google scholar
[3]
Butcher, S. P. (2003) Target discovery and validation in the post-genomic era. Neurochem. Res., 28, 367–371
CrossRef Pubmed Google scholar
[4]
Dundas, J., Ouyang, Z., Tseng, J., Binkowski, A., Turpaz, Y. and Liang, J. (2006) CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res., 34, W116–W118
CrossRef Pubmed Google scholar
[5]
Schmidtke, P., Le Guilloux, V., Maupetit, J. and Tufféry, P. (2010) fpocket: online tools for protein ensemble pocket detection and tracking. Nucleic Acids Res., 38, W582–W589
CrossRef Pubmed Google scholar
[6]
Hussein, H. A., Borrel, A., Geneix, C., Petitjean, M., Regad, L. and Camproux, A.-C. (2015) PockDrug-Server: a new web server for predicting pocket druggability on holo and apo proteins. Nucleic Acids Res., 43, W436–W442
CrossRef Pubmed Google scholar
[7]
Yuan, Y., Pei, J. and Lai, L. (2013) Binding site detection and druggability prediction of protein targets for structure-based drug design. Curr. Pharm. Des., 19, 2326–2333
CrossRef Pubmed Google scholar
[8]
Hajduk, P. J., Huth, J. R. and Fesik, S. W. (2005) Druggability indices for protein targets derived from NMR-based screening data. J. Med. Chem., 48, 2518–2525
CrossRef Pubmed Google scholar
[9]
Rose, P. W., Prlić, A., Altunkaya, A., Bi, C., Bradley, A. R., Christie, C. H., Costanzo, L. D., Duarte, J. M., Dutta, S. and Feng, Z. (2016) The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res., 45, D271–D281
Pubmed
[10]
Mitsopoulos, C., Schierz, A. C., Workman, P. and Al-Lazikani, B. (2015) Distinctive behaviors of druggable proteins in cellular networks. PLoS Comput. Biol., 11, e1004597
CrossRef Pubmed Google scholar
[11]
Lipinski, C. A. (2004) Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol., 1, 337–341
CrossRef Pubmed Google scholar
[12]
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. and Hopkins, A. L. (2012) Quantifying the chemical beauty of drugs. Nat. Chem., 4, 90–98
CrossRef Pubmed Google scholar
[13]
Li, Q. and Lai, L. (2007) Prediction of potential drug targets based on simple sequence properties. BMC Bioinformatics, 8, 353
CrossRef Pubmed Google scholar
[14]
Jamali, A. A., Ferdousi, R., Razzaghi, S., Li, J., Safdari, R. and Ebrahimie, E. (2016) DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins. Drug Discov. Today, 21, 718–724
CrossRef Pubmed Google scholar
[15]
Guo, Y., Yu, L., Wen, Z. and Li, M. (2008) Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences. Nucleic Acids Res., 36, 3025–3030
CrossRef Pubmed Google scholar
[16]
Shen, J., Zhang, J., Luo, X., Zhu, W., Yu, K., Chen, K., Li, Y. and Jiang, H. (2007) Predicting protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. USA, 104, 4337–4341
CrossRef Pubmed Google scholar
[17]
Asgari, E. and Mofrad, M. R. (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One, 10, e0141287
CrossRef Pubmed Google scholar
[18]
Wallach, H. M. (2006) Topic modeling: beyond bag-of-words. In ICML '06 Proceedings of the 23rd International Conference on Machine learning. pp. 977–984, Pittsburgh
[19]
Xue, B., Fu, C. and Shaobin, Z. (2014) A study on sentiment computing and classification of sina weibo with word2vec. In 2014 IEEE International Congress on Big Data. pp. 358–363. Anchorage
[20]
Chung, Y.-A., Wu, C.-C., Shen, C.-H., Lee, H.-Y. and Lee, L.-S. (2016) Audio word2vec: unsupervised learning of audio segment representations using sequence-to-sequence autoencoder. arXiv, 1603.00982
[21]
Ngo, D. L., Yamamoto, N., Tran, V. A., Nguyen, N. G., Phan, D., Lumbanraja, F. R., Kubo, M. and Satou, K. (2016) Application of word embedding to drug repositioning. J. Biomed. Sci. Eng., 9, 7–16
CrossRef Google scholar
[22]
Kimothi, D., Soni, A., Biyani, P. and Hogan, J. M. (2016) Distributed Representations for Biological Sequence Analysis. arXiv:1608.05949
[23]
Vang, Y. S. and Xie, X. (2017) HLA class I binding prediction via convolutional neural networks. Bioinformatics, 33, 2658–2665
CrossRef Pubmed Google scholar
[24]
Kanehisa, M. and Goto, S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 28, 27–30
CrossRef Pubmed Google scholar
[25]
Zeng, Y. H., Guo, Y. Z., Xiao, R. Q., Yang, L., Yu, L. Z. and Li, M. L. (2009) Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J. Theor. Biol., 259, 366–372
CrossRef Pubmed Google scholar
[26]
Liu, T., Geng, X., Zheng, X., Li, R. and Wang, J. (2012) Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles. Amino Acids, 42, 2243–2249
CrossRef Pubmed Google scholar
[27]
Wang, Y.-C., Wang, X.-B., Yang, Z.-X. and Deng, N.-Y. (2010) Prediction of enzyme subfamily class via pseudo amino acid composition by incorporating the conjoint triad feature. Protein Pept. Lett., 17, 1441–1449
CrossRef Pubmed Google scholar
[28]
Ottis, P., Toure, M., Cromm, P. M., Ko, E., Gustafson, J. L. and Crews, C. M. (2017) Assessing different E3 ligases for small molecule induced protein ubiquitination and degradation. ACS Chem. Biol., 12, 2570–2578
CrossRef Pubmed Google scholar

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at https://doi.org/10.1007/s40484-018-0157-2.

ACKNOWLEDGEMENTS

This work was supported in part by the Ministry of Science and Technology of China (No. 2016YFA0502303) and the National Natural Science Foundation of China (Nos. 21673010 and 81273436).
The authors would like to thank Youjun Xu, Shuaishi Gao, Qiwan Hu for discussion and advices.

COMPLIANCE WITH ETHICS GUIDELINES

The authors Tanlin Sun, Luhua Lai and Jianfeng Pei declare they have no conflict of interests.
This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2018 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(1196 KB)

Accesses

Citations

Detail

Sections
Recommended

/