WEDeepT3: predicting type III secreted effectors based on word embedding and deep learning

Xiaofeng Fu; Yang Yang

doi:10.1007/s40484-019-0184-7

PDF(1201 KB)

Quant. Biol. ›› 2019, Vol. 7 ›› Issue (4) : 293-301. DOI: 10.1007/s40484-019-0184-7

RESEARCH ARTICLE

WEDeepT3: predicting type III secreted effectors based on word embedding and deep learning

Xiaofeng Fu ,
Yang Yang

Author information +

History +

Abstract

Background: The type III secreted effectors (T3SEs) are one of the indispensable proteins in the growth and reproduction of Gram-negative bacteria. In particular, the pathogenesis of Gram-negative bacteria depends on the type III secreted effectors, and by injecting T3SEs into a host cell, the host cell’s immunity can be destroyed. The high diversity of T3SE sequences and the lack of defined secretion signals make it difficult to identify and predict. Moreover, the related study of the pathological system associated with T3SE remains a hot topic in bioinformatics. Some computational tools have been developed to meet the growing demand for the recognition of T3SEs and the studies of type III secretion systems (T3SS). Although these tools can help biological experiments in certain procedures, there is still room for improvement, even for the current best model, as the existing methods adopt hand-designed feature and traditional machine learning methods.

Methods: In this study, we propose a powerful predictor based on deep learning methods, called WEDeepT3. Our work consists mainly of three key steps. First, we train word embedding vectors for protein sequences in a large-scale amino acid sequence database. Second, we combine the word vectors with traditional features extracted from protein sequences, like PSSM, to construct a more comprehensive feature representation. Finally, we construct a deep neural network model in the prediction of type III secreted effectors.

Results: The feature representation of WEDeepT3 consists of both word embedding and position-specific features. Working together with convolutional neural networks, the new model achieves superior performance to the state-of-the-art methods, demonstrating the effectiveness of the new feature representation and the powerful learning ability of deep models.

Conclusion: WEDeepT3 exploits both semantic information of k-mer fragments and evolutional information of protein sequences to accurately differentiate between T3SEs and non-T3SEs. WEDeepT3 is available at bcmi.sjtu.edu.cn/~yangyang/WEDeepT3.html.

Keywords

type III secreted effectors / word2vector / PSSM / feature representation

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Xiaofeng Fu, Yang Yang. WEDeepT3: predicting type III secreted effectors based on word embedding and deep learning. Quant. Biol., 2019, 7(4): 293‒301 https://doi.org/10.1007/s40484-019-0184-7

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Galán, J. E. and Wolf-Watz, H. (2006) Protein delivery into eukaryotic cells by type III secretion machines. Nature, 444, 567–573 CrossRef Pubmed Google scholar

[2]	He, S. Y., Nomura, K. and Whittam, T. S. (2004) Type III protein secretion mechanism in mammalian and plant pathogens. Biochim. Biophys. Acta, 1694, 181–206 CrossRef Pubmed Google scholar

[3]	Cornelis, G. R. (2006) The type III secretion injectisome. Nat. Rev. Microbiol., 4, 811–825 CrossRef Pubmed Google scholar

[4]	Brodsky, I. E. and Medzhitov, R. (2009) Targeting of immune signalling networks by bacterial pathogens. Nat. Cell Biol., 11, 521–526 CrossRef Pubmed Google scholar

[5]	Dean, P. (2011) Functional domains and motifs of bacterial type III effector proteins and their roles in infection. FEMS Microbiol. Rev., 35, 1100–1125 CrossRef Pubmed Google scholar

[6]	Guttman, D. S., McHardy, A. C. and Schulze-Lefert, P. (2014) Microbial genome-enabled insights into plant-microorganism interactions. Nat. Rev. Genet., 15, 797–813 CrossRef Pubmed Google scholar

[7]	Yang, Y., Zhao, J., Morgan, R. L., Ma, W. and Jiang, T. (2010) Computational prediction of type III secreted proteins from gram-negative bacteria. BMC Bioinformatics, 11, S47 CrossRef Pubmed Google scholar

[8]	Yang, Y. and Qi, S. (2014) A new feature selection method for computational prediction of type III secreted effectors. Int. J. Data Min. Bioinform., 10, 440–454 CrossRef Pubmed Google scholar

[9]	Fu, X., Xiao , Y. and Yang, Y. (2018) Prediction of Type III Secreted Effectors Based on Word Embeddings for Protein Sequences. In: Bioinformatics Research and Applications, Zhang, F., Cai, Z., Skums, P., Zhang, S. (eds). Lecture Notes in Computer Science, vol 10847. Springer, Cham

[10]	Tay, D. M., Govindarajan, K. R., Khan, A. M., Ong, T. Y., Samad, H. M., Soh, W. W., Tong, M., Zhang, F. and Tan, T. W. (2010) T3SEdb: data warehousing of virulence effectors secreted by the bacterial Type III Secretion System. BMC Bioinformatics, 11, S4 CrossRef Pubmed Google scholar

[11]	Wang, Y., Huang, H., Sun, M., Zhang, Q. and Guo, D. (2012) T3DB: an integrated database for bacterial type III secretion system. BMC Bioinformatics, 13, 66 CrossRef Pubmed Google scholar

[12]	Wang, Y., Zhang, Q., Sun, M. A. and Guo, D. (2011) High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles. Bioinformatics, 27, 777–784 CrossRef Pubmed Google scholar

[13]	Dong, X., Lu, X. and Zhang, Z. (2015) Bean 2.0: an integrated web resource for the identification and functional analysis of type III secreted effectors. Database, 2015, bav064

[14]	Goldberg, T., Rost, B. and Bromberg, Y. (2016) Computational prediction shines light on type III secretion origins. Sci. Rep., 6, 34516 CrossRef Pubmed Google scholar

[15]	Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013) Efficient estimation of word representations in vector space. arXiv: 1301.3781

[16]	Jehl, M.-A., Arnold, R. and Rattei, T. (2011) Effective—a database of predicted secreted bacterial proteins. Nucleic Acids Res., 39, D591–D595 CrossRef Pubmed Google scholar

[17]	Li, W. and Godzik, A. (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 22, 1658–1659 CrossRef Pubmed Google scholar

[18]	Dong, X., Zhang, Y.-J. and Zhang, Z. (2013) Using weakly conserved motifs hidden in secretion signals to identify type-III effectors from bacterial pathogen genomes. PLoS One, 8, e56632 CrossRef Pubmed Google scholar

[19]	Chou, K. C. and Com, M. P. (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 43, 246–255 CrossRef Pubmed Google scholar

[20]	Arnold, R., Brandmaier, S., Kleine, F., Tischler, P., Heinz, E., Behrens, S., Niinikoski, A., Mewes, H. W., Horn, M. and Rattei, T. (2009) Sequence-based prediction of type III secreted proteins. PLoS Pathog., 5, e1000376 CrossRef Pubmed Google scholar

[21]	Wang, Y., Sun, M., Bao, H. and White, A. P. (2013) T3_MM: a Markov model effectively classifies bacterial type III secretion signals. PLoS One, 8, e58173 CrossRef Pubmed Google scholar

[22]	Xue, L., Tang, B., Chen, W. and Luo, J. (2019) DeepT3: deep convolutional neural networks accurately identify Gram-negative bacterial type III secreted effectors using the N-terminal sequence. Bioinformatics, 35, 2051–2057 CrossRef Pubmed Google scholar

[23]	Wang, J., Li, J., Yang, B., Xie, R., Marquez-Lago, T. T., Leier, A., Hayashida, M., Akutsu, T., Zhang, Y., Chou, K.-C., (2019) Bastion3: a two-layer ensemble predictor of type III secreted effectors. Bioinformatics, 35, 2017–2028 CrossRef Pubmed Google scholar

[24]	Maaten, L. d. and Hinton, G. (2008) Visualizing data using t-sne. J. Mach. Learn. Res., 9, 2579–2605

[25]	Klein-Seetharaman, J., Reddy, R. (2002) Biological language modeling: Convergence of computational linguistics and biological chemistry. In: Converging Technologies for Improving Human Performance, pp. 378, Springer

[26]	Asgari, E. and Mofrad, M. R. (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One, 10, e0141287 CrossRef Pubmed Google scholar

[27]	Pennington, J., Socher, R. and Manning, C. (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543

[28]	Deng, W., Marshall, N. C., Rowland, J. L., McCoy, J. M., Worrall, L. J., Santos, A. S., Strynadka, N. C. J. and Finlay, B. B. (2017) Assembly, structure, function and regulation of type III secretion systems. Nat. Rev. Microbiol., 15, 323–337 CrossRef Pubmed Google scholar

[29]	Altschul, S. F. and Koonin, E. V. (1998) Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem. Sci., 23, 444–447 CrossRef Pubmed Google scholar

[30]	Zuo, Y. C., Chen, W., Fan, G. L. and Li, Q. Z. (2013) A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids, 44, 573–580 CrossRef Pubmed Google scholar

[31]	Zuo, Y. C. and Li, Q. Z. (2009) Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet. Peptides, 30, 1788–1793 CrossRef Pubmed Google scholar

[32]	Jeong, J. C., Lin, X. and Chen, X. W. (2011) On position-specific scoring matrix for protein function prediction. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 8, 308–315 CrossRef Pubmed Google scholar

[33]	Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P. (2017) Focal loss for dense object detection. IEEE T. Pattern Anal. Mach. Intell., 99, 2999–3007

ACKNOWLEDGEMENTS

This work has been supported by the National Natural Science Foundation of China (No. 61972251).

COMPLIANCE WITH ETHICS GUIDELINES

The authors Xiaofeng Fu and Yang Yang declare that they have no conflict of interests.

This article does not contain any studies with human or animal subjects performed by any of the authors.

RIGHTS & PERMISSIONS

2019 Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap

PDF(1201 KB)

1558

Accesses

Citations

Detail

Sections

Recommended

Abstract
Keywords
Cite this article
References
ACKNOWLEDGEMENTS
COMPLIANCE WITH ETHICS GUIDELINES
RIGHTS & PERMISSIONS

Received	Revised	Accepted	Published
17 Jun 2019	13 Aug 2019	26 Aug 2019	15 Dec 2019
Just Accepted Date	Online First Date	Issue Date
24 Oct 2019	09 Dec 2019	31 Dec 2019

About the journal

Aims & scopes

Description

Editorial board

Abstracting / Indexing

Cover gallery

Contact us

Browse

Just accepted

Online first

Latest issue

All volumes and issues

Collections

Featured articles

Most accessed

Most cited

Collections

Authors & reviewers

Online submisson

Call for papers

Editorial policy

Guidelines for authors

Download templates

Classifications via endnote

Guidelines for reviewers

Author FAQs