TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks

Zohreh Baharvand Irannia, Ting Chen

PDF(437 KB)
PDF(437 KB)
Quant. Biol. ›› 2016, Vol. 4 ›› Issue (3) : 149-158. DOI: 10.1007/s40484-016-0073-2
RESEARCH ARTICLE
RESEARCH ARTICLE

TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks

Author information +
History +

Abstract

Background: A main goal of metagenomics is taxonomic characterization of microbial communities. Although sequence comparison has been the main method for the taxonomic classification, there is not a clear agreement on similarity calculation and similarity thresholds, especially at higher taxonomic levels such as phylum and class. Thus taxonomic classification of novel metagenomic sequences without close homologs in the biological databases poses a challenge.

Methods: In this study, we propose to use the co-abundant associations between taxa/operational taxonomic units (OTU) across complex and diverse communities to assist taxonomic classification. We developed a Markov Random Field model to predict taxa of unknown microorganisms using co-abundant associations.

Results: Although such associations are intrinsically functional associations, we demonstrate that they are strongly correlated with taxonomic associations and can be combined with sequence comparison methods to predict taxonomic origins of unknown microorganisms at phylum and class levels.

Conclusions: With the ever-increasing accumulation of sequence data from microbial communities, we now take the first step to explore these associations for taxonomic identification beyond sequence similarity.

Availability and Implementation: Source codes of TACO are freely available at the following URL: https://github.com/baharvand/OTU-Taxonomy-Identification implemented in C++, supported on Linux and MS Windows.

Graphical abstract

Keywords

metagenomics / 16s rRNA gene / taxonomic profiling / taxonomic prediction / Markov Random Field / OTU co-abundance network

Cite this article

Download citation ▾
Zohreh Baharvand Irannia, Ting Chen. TACO: Taxonomic prediction of unknown OTUs through OTU co-abundance networks. Quant. Biol., 2016, 4(3): 149‒158 https://doi.org/10.1007/s40484-016-0073-2

References

[1]
Handelsman, J. (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev., 68, 669– 685
CrossRef Google scholar
[2]
Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K. S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature, 464, 59–65
CrossRef Google scholar
[3]
Amann, R. I., Ludwig, W. and Schleifer, K. H. (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev., 59, 143–169
[4]
Eisen, J. A. (2007) Environmental shotgun sequencing: its potential and challenges for studying the hidden world of microbes. PLoS Biol., 5, e82
CrossRef Google scholar
[5]
Hugenholtz, P., Goebel, B. M. and Pace, N. R. (1998) Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J. Bacteriol., 180, 4765–4774
[6]
Riesenfeld, C. S., Schloss, P. D. and Handelsman, J. (2004) Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet., 38, 525–552
CrossRef Google scholar
[7]
Wooley, J. C. and Ye, Y. (2010) Metagenomics: facts and artifacts, and computational challenges. J. Comput. Sci. Technol., 25, 71–81
CrossRef Google scholar
[8]
Thomas, T., Gilbert, J. and Meyer, F. (2012) Metagenomics — a guide from sampling to data analysis. Microb. Inform. Exp., 2, 3
CrossRef Google scholar
[9]
Teeling, H. and Glöckner, F. O. (2012) Current opportunities and challenges in microbial metagenome analysis — a bioinformatic perspective. Brief. Bioinform., 13, 728–742
CrossRef Google scholar
[10]
Mande, S. S., Mohammed, M. H. and Ghosh, T. S. (2012) Classification of metagenomic sequences: methods and challenges. Brief. Bioinform., 13, 669–681
CrossRef Google scholar
[11]
Maidak, B. (1996) The Ribosomal Database Project (RDP). Nucleic Acids Res., 24, 82–85
CrossRef Google scholar
[12]
Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J. and Glöckner, F. O. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res., 41, D590–D596
CrossRef Google scholar
[13]
DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P. and Andersen, G. L. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol., 72, 5069–5072
CrossRef Google scholar
[14]
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410
CrossRef Google scholar
[15]
Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E. M., Kubal, M., Paczian, T., Rodriguez, A., Stevens, R., Wilke, A., (2008) The metagenomics RAST server — a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics, 9, 386
CrossRef Google scholar
[16]
Huson, D. H., Auch, A. F., Qi, J. and Schuster, S. C. (2007) MEGAN analysis of metagenomic data. Genome Res., 17, 377–386
CrossRef Google scholar
[17]
Schloss, P. D. (2010) The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput. Biol., 6, e1000844
CrossRef Google scholar
[18]
Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F. O., Ludwig, W., Schleifer, K.-H., Whitman, W. B., Euzéby, J., Amann, R. and Rosselló-Móra, R. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol., 12, 635–645
CrossRef Google scholar
[19]
Freilich, S., Kreimer, A., Meilijson, I., Gophna, U., Sharan, R. and Ruppin, E. (2010) The large-scale organization of the bacterial network of ecological co-occurrence interactions. Nucleic Acids Res., 38, 3857–3868
CrossRef Google scholar
[20]
Chaffron, S., Rehrauer, H., Pernthaler, J. and von Mering, C. (2010) A global network of coexisting microbes from environmental and whole-genome sequence data. Genome Res., 20, 947–959
CrossRef Google scholar
[21]
Barberán, A., Bates, S. T., Casamayor, E. O. and Fierer, N. (2012) Using network analysis to explore co-occurrence patterns in soil microbial communities. ISME J., 6, 343–351
CrossRef Google scholar
[22]
Faust, K. and Raes, J. (2012) Microbial interactions: from networks to models. Nat. Rev. Microbiol., 10, 538–550
CrossRef Google scholar
[23]
Steele, J. A., Countway, P. D., Xia, L., Vigil, P. D., Beman, J. M., Kim, D. Y., Chow, C.-E. T., Sachdeva, R., Jones, A. C., Schwalbach, M. S., (2011) Marine bacterial, archaeal and protistan association networks reveal ecological linkages. ISME J., 5, 1414–1425
CrossRef Google scholar
[24]
Gilbert, J. A., Steele, J. A., Caporaso, J. G., Steinbrück, L., Reeder, J., Temperton, B., Huse, S., McHardy, A. C., Knight, R., Joint, I., (2012) Defining seasonal marine microbial community dynamics. ISME J., 6, 298–308
CrossRef Google scholar
[25]
Kindermann, R. and Snell, J. L. (1980) Markov Random Fields and Their Applications. V. 1. Of Contemporary Mathematics. Rhode Island: American Mathematical Society
[26]
Deng M., Zhang K., Mehta S., Chen T., Sun F. (2004) Prediction of protein function using protein-protein interaction data. J. Comp. Biol. 10, 947–960
CrossRef Google scholar
[27]
Human-Intestine-NCBI, http://www.ncbi.nlm.nih.gov/bioproject/204926
[28]
Human-Skin NCBI, http://www.ncbi.nlm.nih.gov/bioproject/PRJEB3280
[29]
Soil-NCBI, http://www.ncbi.nlm.nih.gov/bioproject/PRJEB4349
[30]
Hao, X., Jiang, R. and Chen, T. (2011) Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics, 27, 611–618
CrossRef Google scholar
[31]
Lan, Y., Wang, Q., Cole, J. R. and Rosen, G. L. (2012) Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS One, 7, e32491
CrossRef Google scholar
[32]
Newman, M. E. J. (2006) Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA, 103, 8577–8582
CrossRef Google scholar
[33]
Maslov, S. and Sneppen, K. (2002) Specificity and stability in topology of protein networks. Science, 296, 910–913
CrossRef Google scholar

SUPPLEMENTARY MATERIALS

The supplementary materials can be found online with this article at DOI 10.1007/s40484-016-0073-2.

ACKNOWLEDGEMENTS

The authors thank Professor Fengzhu Sun for his helpful suggestions. This research was partially supported by NIH Center of Excellence in Genomic Sciences (NIH/HG 2 P50 HG002790-06), NIH/NHGRI 1U01 HG006531-01, NSF/DMS ATD 7031026, and NSFC 91019016.

COMPLIANCE WITH ETHICS GUIDELINES

The author Zohreh Baharvand Irannia andTing Chen declare that they have no conflict of interests.
Funding
 

RIGHTS & PERMISSIONS

2016 Higher Education Press and Springer-Verlag Berlin Heidelberg
AI Summary AI Mindmap
PDF(437 KB)

Accesses

Citations

Detail

Sections
Recommended

/