WEDeepT3: predicting type III secreted effectors based on word embedding and deep learning

Xiaofeng Fu, Yang Yang

PDF(1201 KB)
PDF(1201 KB)
Quant. Biol. ›› 2019, Vol. 7 ›› Issue (4) : 293-301. DOI: 10.1007/s40484-019-0184-7
RESEARCH ARTICLE
RESEARCH ARTICLE

WEDeepT3: predicting type III secreted effectors based on word embedding and deep learning

Author information +
History +

Abstract

Background: The type III secreted effectors (T3SEs) are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria. In particular, the pathogenesis of Gram-negative bacteria depends on the type III secreted effectors, and by injecting T3SEs into a host cell, the host cell’s immunity can be destroyed. The high diversity of T3SE sequences and the lack of defined secretion signals make it difficult to identify and predict. Moreover, the related study of the pathological system associated with T3SE remains a hot topic in bioinformatics. Some computational tools have been developed to meet the growing demand for the recognition of T3SEs and the studies of type III secretion systems (T3SS). Although these tools can help biological experiments in certain procedures, there is still room for improvement, even for the current best model, as the existing methods adopt hand-designed feature and traditional machine learning methods.

Methods: In this study, we propose a powerful predictor based on deep learning methods, called WEDeepT3. Our work consists mainly of three key steps. First, we train word embedding vectors for protein sequences in a large-scale amino acid sequence database. Second, we combine the word vectors with traditional features extracted from protein sequences, like PSSM, to construct a more comprehensive feature representation. Finally, we construct a deep neural network model in the prediction of type III secreted effectors.

Results: The feature representation of WEDeepT3 consists of both word embedding and position-specific features. Working together with convolutional neural networks, the new model achieves superior performance to the state-of-the-art methods, demonstrating the effectiveness of the new feature representation and the powerful learning ability of deep models.

Conclusion: WEDeepT3 exploits both semantic information of k-mer fragments and evolutional information of protein sequences to accurately differentiate between T3SEs and non-T3SEs. WEDeepT3 is available at bcmi.sjtu.edu.cn/~yangyang/WEDeepT3.html.

Keywords

type III secreted effectors / word2vector / PSSM / feature representation

Cite this article

Download citation ▾
Xiaofeng Fu, Yang Yang. WEDeepT3: predicting type III secreted effectors based on word embedding and deep learning. Quant. Biol., 2019, 7(4): 293‒301 https://doi.org/10.1007/s40484-019-0184-7

References

[1]
Galán, J. E. and Wolf-Watz, H. (2006) Protein delivery into eukaryotic cells by type III secretion machines. Nature, 444, 567–573
CrossRef Pubmed Google scholar
[2]
He, S. Y., Nomura, K. and Whittam, T. S. (2004) Type III protein secretion mechanism in mammalian and plant pathogens. Biochim. Biophys. Acta, 1694, 181–206
CrossRef Pubmed Google scholar
[3]
Cornelis, G. R. (2006) The type III secretion injectisome. Nat. Rev. Microbiol., 4, 811–825
CrossRef Pubmed Google scholar
[4]
Brodsky, I. E. and Medzhitov, R. (2009) Targeting of immune signalling networks by bacterial pathogens. Nat. Cell Biol., 11, 521–526
CrossRef Pubmed Google scholar
[5]
Dean, P. (2011) Functional domains and motifs of bacterial type III effector proteins and their roles in infection. FEMS Microbiol. Rev., 35, 1100–1125
CrossRef Pubmed Google scholar
[6]
Guttman, D. S., McHardy, A. C. and Schulze-Lefert, P. (2014) Microbial genome-enabled insights into plant-microorganism interactions. Nat. Rev. Genet., 15, 797–813
CrossRef Pubmed Google scholar
[7]
Yang, Y., Zhao, J., Morgan, R. L., Ma, W. and Jiang, T. (2010) Computational prediction of type III secreted proteins from gram-negative bacteria. BMC Bioinformatics, 11, S47
CrossRef Pubmed Google scholar
[8]
Yang, Y. and Qi, S. (2014) A new feature selection method for computational prediction of type III secreted effectors. Int. J. Data Min. Bioinform., 10, 440–454
CrossRef Pubmed Google scholar
[9]
Fu, X., Xiao , Y. and Yang, Y. (2018) Prediction of Type III Secreted Effectors Based on Word Embeddings for Protein Sequences. In: Bioinformatics Research and Applications, Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds). Lecture Notes in Computer Science, vol 10847. Springer, Cham
[10]
Tay, D. M., Govindarajan, K. R., Khan, A. M., Ong, T. Y., Samad, H. M., Soh, W. W., Tong, M., Zhang, F. and Tan, T. W. (2010) T3SEdb: data warehousing of virulence effectors secreted by the bacterial Type III Secretion System. BMC Bioinformatics, 11, S4
CrossRef Pubmed Google scholar
[11]
Wang, Y., Huang, H., Sun, M., Zhang, Q. and Guo, D. (2012) T3DB: an integrated database for bacterial type III secretion system. BMC Bioinformatics, 13, 66
CrossRef Pubmed Google scholar
[12]
Wang, Y., Zhang, Q., Sun, M. A. and Guo, D. (2011) High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles. Bioinformatics, 27, 777–784
CrossRef Pubmed Google scholar
[13]
Dong, X., Lu, X. and Zhang, Z. (2015) Bean 2.0: an integrated web resource for the identification and functional analysis of type III secreted effectors. Database, 2015, bav064
[14]
Goldberg, T., Rost, B. and Bromberg, Y. (2016) Computational prediction shines light on type III secretion origins. Sci. Rep., 6, 34516
CrossRef Pubmed Google scholar
[15]
Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013) Efficient estimation of word representations in vector space. arXiv: 1301.3781
[16]
Jehl, M.-A., Arnold, R. and Rattei, T. (2011) Effective—a database of predicted secreted bacterial proteins. Nucleic Acids Res., 39, D591–D595
CrossRef Pubmed Google scholar
[17]
Li, W. and Godzik, A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 22, 1658–1659
CrossRef Pubmed Google scholar
[18]
Dong, X., Zhang, Y.-J. and Zhang, Z. (2013) Using weakly conserved motifs hidden in secretion signals to identify type-III effectors from bacterial pathogen genomes. PLoS One, 8, e56632
CrossRef Pubmed Google scholar
[19]
Chou, K. C. and Com, M. P. (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 43, 246–255
CrossRef Pubmed Google scholar
[20]
Arnold, R., Brandmaier, S., Kleine, F., Tischler, P., Heinz, E., Behrens, S., Niinikoski, A., Mewes, H. W., Horn, M. and Rattei, T. (2009) Sequence-based prediction of type III secreted proteins. PLoS Pathog., 5, e1000376
CrossRef Pubmed Google scholar
[21]
Wang, Y., Sun, M., Bao, H. and White, A. P. (2013) T3_MM: a Markov model effectively classifies bacterial type III secretion signals. PLoS One, 8, e58173
CrossRef Pubmed Google scholar
[22]
Xue, L., Tang, B., Chen, W. and Luo, J. (2019) DeepT3: deep convolutional neural networks accurately identify Gram-negative bacterial type III secreted effectors using the N-terminal sequence. Bioinformatics, 35, 2051–2057
CrossRef Pubmed Google scholar
[23]
Wang, J., Li, J., Yang, B., Xie, R., Marquez-Lago, T. T., Leier, A., Hayashida, M., Akutsu, T., Zhang, Y., Chou, K.-C., (2019) Bastion3: a two-layer ensemble predictor of type III secreted effectors. Bioinformatics, 35, 2017–2028
CrossRef Pubmed Google scholar
[24]
Maaten, L. d. and Hinton, G. (2008) Visualizing data using t-sne. J. Mach. Learn. Res., 9, 2579–2605
[25]
Klein-Seetharaman, J., Reddy, R. (2002) Biological language modeling: Convergence of computational linguistics and biological chemistry. In: Converging Technologies for Improving Human Performance, pp. 378, Springer
[26]
Asgari, E. and Mofrad, M. R. (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One, 10, e0141287
CrossRef Pubmed Google scholar
[27]
Pennington, J., Socher, R. and Manning, C. (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543
[28]
Deng, W., Marshall, N. C., Rowland, J. L., McCoy, J. M., Worrall, L. J., Santos, A. S., Strynadka, N. C. J. and Finlay, B. B. (2017) Assembly, structure, function and regulation of type III secretion systems. Nat. Rev. Microbiol., 15, 323–337
CrossRef Pubmed Google scholar
[29]
Altschul, S. F. and Koonin, E. V. (1998) Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem. Sci., 23, 444–447
CrossRef Pubmed Google scholar
[30]
Zuo, Y. C., Chen, W., Fan, G. L. and Li, Q. Z. (2013) A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids, 44, 573–580
CrossRef Pubmed Google scholar
[31]
Zuo, Y. C. and Li, Q. Z. (2009) Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet. Peptides, 30, 1788–1793
CrossRef Pubmed Google scholar
[32]
Jeong, J. C., Lin, X. and Chen, X. W. (2011) On position-specific scoring matrix for protein function prediction. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 8, 308–315
CrossRef Pubmed Google scholar
[33]
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P. (2017) Focal loss for dense object detection. IEEE T. Pattern Anal. Mach. Intell., 99, 2999–3007

ACKNOWLEDGEMENTS

This work has been supported by the National Natural Science Foundation of China (No. 61972251).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Xiaofeng Fu and Yang Yang declare that they have no conflict of interests.
This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2019 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature
AI Summary AI Mindmap
PDF(1201 KB)

Accesses

Citations

Detail

Sections
Recommended

/