CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing

doi:10.1007/s13238-012-2015-8

PDF(264 KB)

Protein Cell ›› 2012, Vol. 3 ›› Issue (2) : 148-152. DOI: 10.1007/s13238-012-2015-8

RESEARCH ARTICLE

CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing

Guoguang Zhao^1,4, Dechao Bu^1,4, Changning Liu¹, Jing Li¹, Jian Yang³, Zhiyong Liu¹, Yi Zhao¹(), Runsheng Chen^1,2()

Author information +

History +

Abstract

Estimating taxonomic content constitutes a key problem in metagenomic sequencing data analysis. However, extracting such content from high-throughput data of next-generation sequencing is very time-consuming with the currently available software. Here, we present CloudLCA, a parallel LCA algorithm that significantly improves the efficiency of determining taxonomic composition in metagenomic data analysis. Results show that CloudLCA (1) has a running time nearly linear with the increase of dataset magnitude, (2) displays linear speedup as the number of processors grows, especially for large datasets, and (3) reaches a speed of nearly 215 million reads each minute on a cluster with ten thin nodes. In comparison with MEGAN, a well-known metagenome analyzer, the speed of CloudLCA is up to 5 more times faster, and its peak memory usage is approximately 18.5% that of MEGAN, running on a fat node. CloudLCA can be run on one multiprocessor node or a cluster. It is expected to be part of MEGAN to accelerate analyzing reads, with the same output generated as MEGAN, which can be import into MEGAN in a direct way to finish the following analysis. Moreover, CloudLCA is a universal solution for finding the lowest common ancestor, and it can be applied in other fields requiring an LCA algorithm.

Keywords

CloudLCA / metagenome analysis / cloud computing

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Guoguang Zhao, Dechao Bu, Changning Liu, Jing Li, Jian Yang, Zhiyong Liu, Yi Zhao, Runsheng Chen. CloudLCA: finding the lowest common ancestor in metagenome analysis using cloud computing. Prot Cell, 2012, 3(2): 148‒152 https://doi.org/10.1007/s13238-012-2015-8

References

[1] Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., Taylor, J. (2010). Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol, Chapter 19, Unit 19 . 1011-21 .
[2] Blankenberg, D., Taylor, J., Schenck, I., He, J., Zhang, Y., Ghent, M., Veeraraghavan, N., Albert, I., Miller, W., Makova, K.D., . (2007). A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res 17, 960-964 .10.1101/gr.5578007
[3] Huson, D.H., Auch, A.F., Qi, J., and Schuster, S.C. (2007). MEGAN analysis of metagenomic data. Genome Res 17, 377-386 .10.1101/gr.5969107
[4] Huson, D.H., Mitra, S., Ruscheweyh, H.J., Weber, N., and Schuster, S.C. (2011). Integrative analysis of environmental sequences using MEGAN4. Genome Res 21, 1552-1560 .10.1101/gr.120618.111
[5] Huson, D.H., Richter, D.C., Mitra, S., Auch, A.F., and Schuster, S.C. (2009). Methods for comparative metagenomics. BMC Bioinformatics 10, S12.10.1186/1471-2105-10-S1-S12
[6] L?mmel, R. (2007). Google's MapReduce programming model- Revisited. Sci Comput Program 68, 208-237 .
[7] Langmead, B., Hansen, K.D., and Leek, J.T. (2010). Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11, R83.10.1186/gb-2010-11-8-r83
[8] Metzker, M.L. (2010). Sequencing technologies- the next generation. Nat Rev Genet 11, 31-46 .10.1038/nrg2626
[9] Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K.S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., , and the MetaHIT Consortium. (2010). A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59-65 .10.1038/nature08821
[10] Schatz, M.C. (2009). CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25, 1363-1369 .10.1093/bioinformatics/btp236
[11] Sudha Sadasivam, G., and Baktavatchalam, G. (2010). A novel approach to multiple sequence alignment using hadoop data grids. Int J Bioinform Res Appl 6, 472-483 .10.1504/IJBRA.2010.037987
[12] Yang, J., Yang, F., Ren, L., Xiong, Z., Wu, Z., Dong, J., Sun, L., Zhang, T., Hu, Y., Du, J., . (2011). Unbiased parallel detection of viral pathogens in clinical samples by use of a metagenomic approach. J Clin Microbiol 49, 3463-3469 .10.1128/JCM.00273-11