TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks
Zohreh Baharvand Irannia, Ting Chen
TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks
Background: A main goal of metagenomics is taxonomic characterization of microbial communities. Although sequence comparison has been the main method for the taxonomic classification, there is not a clear agreement on similarity calculation and similarity thresholds, especially at higher taxonomic levels such as phylum and class. Thus taxonomic classification of novel metagenomic sequences without close homologs in the biological databases poses a challenge.
Methods: In this study, we propose to use the co-abundant associations between taxa/operational taxonomic units (OTU) across complex and diverse communities to assist taxonomic classification. We developed a Markov Random Field model to predict taxa of unknown microorganisms using co-abundant associations.
Results: Although such associations are intrinsically functional associations, we demonstrate that they are strongly correlated with taxonomic associations and can be combined with sequence comparison methods to predict taxonomic origins of unknown microorganisms at phylum and class levels.
Conclusions: With the ever-increasing accumulation of sequence data from microbial communities, we now take the first step to explore these associations for taxonomic identification beyond sequence similarity.
Availability and Implementation: Source codes of TACO are freely available at the following URL: https://github.com/baharvand/OTU-Taxonomy-Identification implemented in C++, supported on Linux and MS Windows.
metagenomics / 16s rRNA gene / taxonomic profiling / taxonomic prediction / Markov Random Field / OTU co-abundance network
[1] |
Handelsman, J. (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev., 68, 669– 685
CrossRef
Google scholar
|
[2] |
Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K. S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T.,
CrossRef
Google scholar
|
[3] |
Amann, R. I., Ludwig, W. and Schleifer, K. H. (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev., 59, 143–169
|
[4] |
Eisen, J. A. (2007) Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biol., 5, e82
CrossRef
Google scholar
|
[5] |
Hugenholtz, P., Goebel, B. M. and Pace, N. R. (1998) Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J. Bacteriol., 180, 4765–4774
|
[6] |
Riesenfeld, C. S., Schloss, P. D. and Handelsman, J. (2004) Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet., 38, 525–552
CrossRef
Google scholar
|
[7] |
Wooley, J. C. and Ye, Y. (2010) Metagenomics: facts and artifacts, and computational challenges. J. Comput. Sci. Technol., 25, 71–81
CrossRef
Google scholar
|
[8] |
Thomas, T., Gilbert, J. and Meyer, F. (2012) Metagenomics — a guide from sampling to data analysis. Microb. Inform. Exp., 2, 3
CrossRef
Google scholar
|
[9] |
Teeling, H. and Glöckner, F. O. (2012) Current opportunities and challenges in microbial metagenome analysis — a bioinformatic perspective. Brief. Bioinform., 13, 728–742
CrossRef
Google scholar
|
[10] |
Mande, S. S., Mohammed, M. H. and Ghosh, T. S. (2012) Classification of metagenomic sequences: methods and challenges. Brief. Bioinform., 13, 669–681
CrossRef
Google scholar
|
[11] |
Maidak, B. (1996) The Ribosomal Database Project (RDP). Nucleic Acids Res., 24, 82–85
CrossRef
Google scholar
|
[12] |
Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J. and Glöckner, F. O. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res., 41, D590–D596
CrossRef
Google scholar
|
[13] |
DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P. and Andersen, G. L. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol., 72, 5069–5072
CrossRef
Google scholar
|
[14] |
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410
CrossRef
Google scholar
|
[15] |
Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E. M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A.,
CrossRef
Google scholar
|
[16] |
Huson, D. H., Auch, A. F., Qi, J. and Schuster, S. C. (2007) MEGAN analysis of metagenomic data. Genome Res., 17, 377–386
CrossRef
Google scholar
|
[17] |
Schloss, P. D. (2010) The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput. Biol., 6, e1000844
CrossRef
Google scholar
|
[18] |
Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F. O., Ludwig, W., Schleifer, K.-H., Whitman, W. B., Euzéby, J., Amann, R. and Rosselló-Móra, R. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol., 12, 635–645
CrossRef
Google scholar
|
[19] |
Freilich, S., Kreimer, A., Meilijson, I., Gophna, U., Sharan, R. and Ruppin, E. (2010) The large-scale organization of the bacterial network of ecological co-occurrence interactions. Nucleic Acids Res., 38, 3857–3868
CrossRef
Google scholar
|
[20] |
Chaffron, S., Rehrauer, H., Pernthaler, J. and von Mering, C. (2010) A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Res., 20, 947–959
CrossRef
Google scholar
|
[21] |
Barberán, A., Bates, S. T., Casamayor, E. O. and Fierer, N. (2012) Using network analysis to explore co-occurrence patterns in soil microbial communities. ISME J., 6, 343–351
CrossRef
Google scholar
|
[22] |
Faust, K. and Raes, J. (2012) Microbial interactions: from networks to models. Nat. Rev. Microbiol., 10, 538–550
CrossRef
Google scholar
|
[23] |
Steele, J. A., Countway, P. D., Xia, L., Vigil, P. D., Beman, J. M., Kim, D. Y., Chow, C.-E. T., Sachdeva, R., Jones, A. C., Schwalbach, M. S.,
CrossRef
Google scholar
|
[24] |
Gilbert, J. A., Steele, J. A., Caporaso, J. G., Steinbrück, L., Reeder, J., Temperton, B., Huse, S., McHardy, A. C., Knight, R., Joint, I.,
CrossRef
Google scholar
|
[25] |
Kindermann, R. and Snell, J. L. (1980) Markov Random Fields and Their Applications. V. 1. Of Contemporary Mathematics. Rhode Island: American Mathematical Society
|
[26] |
Deng M., Zhang K., Mehta S., Chen T., Sun F. (2004) Prediction of protein function using protein-protein interaction data. J. Comp. Biol. 10, 947–960
CrossRef
Google scholar
|
[27] |
Human-Intestine-NCBI, http://www.ncbi.nlm.nih.gov/bioproject/204926
|
[28] |
Human-Skin NCBI, http://www.ncbi.nlm.nih.gov/bioproject/PRJEB3280
|
[29] |
Soil-NCBI, http://www.ncbi.nlm.nih.gov/bioproject/PRJEB4349
|
[30] |
Hao, X., Jiang, R. and Chen, T. (2011) Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics, 27, 611–618
CrossRef
Google scholar
|
[31] |
Lan, Y., Wang, Q., Cole, J. R. and Rosen, G. L. (2012) Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS One, 7, e32491
CrossRef
Google scholar
|
[32] |
Newman, M. E. J. (2006) Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 103, 8577–8582
CrossRef
Google scholar
|
[33] |
Maslov, S. and Sneppen, K. (2002) Specificity and stability in topology of protein networks. Science, 296, 910–913
CrossRef
Google scholar
|
/
〈 | 〉 |