The best practice for microbiome analysis using R

Tao Wen, Guoqing Niu, Tong Chen, Qirong Shen, Jun Yuan, Yong-Xin Liu

PDF(2009 KB)
PDF(2009 KB)
Protein Cell ›› 2023, Vol. 14 ›› Issue (10) : 713-725. DOI: 10.1093/procel/pwad024
REVIEW
REVIEW

The best practice for microbiome analysis using R

Author information +
History +

Abstract

With the gradual maturity of sequencing technology, many microbiome studies have published, driving the emergence and advance of related analysis tools. R language is the widely used platform for microbiome data analysis for powerful functions. However, tens of thousands of R packages and numerous similar analysis tools have brought major challenges for many researchers to explore microbiome data. How to choose suitable, efficient, convenient, and easy-to-learn tools from the numerous R packages has become a problem for many microbiome researchers. We have organized 324 common R packages for microbiome analysis and classified them according to application categories (diversity, difference, biomarker, correlation and network, functional prediction, and others), which could help researchers quickly find relevant R packages for microbiome analysis. Furthermore, we systematically sorted the integrated R packages (phyloseq, microbiome, MicrobiomeAnalystR, Animalcules, microeco, and amplicon) for microbiome analysis, and summarized the advantages and limitations, which will help researchers choose the appropriate tools. Finally, we thoroughly reviewed the R packages for microbiome analysis, summarized most of the common analysis content in the microbiome, and formed the most suitable pipeline for microbiome analysis. This paper is accompanied by hundreds of examples with 10,000 lines codes in GitHub, which can help beginners to learn, also help analysts compare and test different tools. This paper systematically sorts the application of R in microbiome, providing an important theoretical basis and practical reference for the development of better microbiome tools in the future. All the code is available at GitHub github.com/taowenmicro/EasyMicrobiomeR.

Keywords

R package / microbiome / data analysis / visualization / amplicon / metagenome

Cite this article

Download citation ▾
Tao Wen, Guoqing Niu, Tong Chen, Qirong Shen, Jun Yuan, Yong-Xin Liu. The best practice for microbiome analysis using R. Protein Cell, 2023, 14(10): 713‒725 https://doi.org/10.1093/procel/pwad024

References

[1]
Amir A, McDonald D, Navas-Molina JA et al. Deblur rapidly resolves single-nucleotide community sequence patterns. MSystems 2017;2:e00191–e00116.
CrossRef Google scholar
[2]
Aßhauer KP, Wemheuer B, Daniel R et al. Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 2015;31:2882–2884.
CrossRef Google scholar
[3]
Barnett DJ, Arts IC, Penders J. microViz: an R package for microbiome data visualization and statistics. J Open Source Softw 2021;6:3201.
CrossRef Google scholar
[4]
Bolyen E, Rideout JR, Dillon MR et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019;37:852–857.
CrossRef Google scholar
[5]
Callahan BJ, McMurdie PJ, Rosen MJ et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 2016;13:581–583.
CrossRef Google scholar
[6]
Caporaso JG, Kuczynski J, Stombaugh J et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010;7:335–336.
CrossRef Google scholar
[7]
Carrión VJ, Perez-Jaramillo J, Cordovez V et al. Pathogen-induced activation of disease-suppressive functions in the endophytic root microbiome. Science 2019;366:606–612.
CrossRef Google scholar
[8]
Chen H, Boutros PC. VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinf 2011;12:1–7.
CrossRef Google scholar
[9]
Chen T, Zhang H, Liu Y et al. EVenn: easy to create repeatable and editable Venn diagrams and Venn networks online. J Genet Genom 2021;48:863–866.
CrossRef Google scholar
[10]
Chen Y, Li J, Zhang Y et al. Parallel-Meta Suite: interactive and rapid microbiome data analysis on multiple platforms. iMeta 2022;1:e1.
CrossRef Google scholar
[11]
Chong J, Liu P, Zhou G et al. Using MicrobiomeAnalyst for comprehensive statistical, functional, and meta-analysis of microbiome data. Nat Protoc 2020;15:799–821.
CrossRef Google scholar
[12]
Conway JR, Lex A, Gehlenborg NU. An R package for the visualization of intersecting sets and their properties. Bioinformatics 2017;33:2938–2940.
CrossRef Google scholar
[13]
Dimitriadou E, Hornik K, Leisch F et al. Misc functions of the Department of Statistics (e1071), TU Wien. R Package 2008;1:5–24.
[14]
Dray S, Dufour A-B. The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 2007;22:1–20.
CrossRef Google scholar
[15]
Dray S, Blanchet G, Borcard D et al. Package ‘adespatial’. R Package 2018;1:3–8.
[16]
Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010;26:2460–2461.
CrossRef Google scholar
[17]
Edgar RC, Flyvbjerg H. Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics 2015;31:3476–3482.
CrossRef Google scholar
[18]
Fisher RA. The use of multiple measurements in taxonomic problems. Ann Eugen 1936;7:179–188.
CrossRef Google scholar
[19]
Franzosa EA, McIver LJ, Rahnavard G et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 2018;15:962–968.
CrossRef Google scholar
[20]
Gu Z. Complex heatmap visualization. iMeta 2022;1:e43.
CrossRef Google scholar
[21]
Gu Z, Gu L, Eils R et al. Circlize implements and enhances circular visualization in R. Bioinformatics 2014;30:2811–2812.
CrossRef Google scholar
[22]
Hamilton NE, Ferry M. ggtern: Ternary diagrams using ggplot2. J Stat Softw 2018;87:1–17.
CrossRef Google scholar
[23]
Harrell Jr FE, Harrell Jr MFE. Package ‘hmisc’. CRAN2018 2019;2019:235–236.
[24]
Hofner B, Mayr A, Robinzonov N et al. Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput Stat 2014;29:3–35.
CrossRef Google scholar
[25]
Huerta-Cepas J, Forslund K, Coelho LP et al. Fast genome-wide functional annotation through orthology assignment by egg-NOG-mapper. Mol Biol Evol 2017;34:2115–2122.
CrossRef Google scholar
[26]
Huson DH, Auch AF, Qi J et al. MEGAN analysis of metagenomic data. Genome Res 2007;17:377–386.
CrossRef Google scholar
[27]
Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat 1996;5:299–314.
CrossRef Google scholar
[28]
Kembel SW, Cowan PD, Helmus MR et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 2010;26:1463–1464.
CrossRef Google scholar
[29]
Knights D, Kuczynski J, Charlson ES et al. Bayesian community-wide culture-independent microbial source tracking. Nat Methods 2011;8:761–763.
CrossRef Google scholar
[30]
Kuhn M. Building predictive models in R using the caret package. J Stat Softw 2008;28:1–26.
CrossRef Google scholar
[31]
Kurtz ZD, Müller CL, Miraldi ER et al. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol 2015;11:e1004226.
CrossRef Google scholar
[32]
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf 2008;9:1–13.
CrossRef Google scholar
[33]
Li W, Wang L, Li X et al. Sequence-based functional metagenomics reveals novel natural diversity of functioning CopA in environmental microbiomes. Genom Proteom Bioinform 2022;20:1–12.
CrossRef Google scholar
[34]
Liaw A, Wiener M. Classification and regression by randomForest. R News 2002;2:18–22.
[35]
Lin H, Peddada SD. Analysis of microbial compositions: a review of normalization and differential abundance analysis. Npj Biofilms Microbiomes 2020;6:1–13.
CrossRef Google scholar
[36]
Liu C, Cui Y, Li X et al. microeco: an R package for data mining in microbial community ecology. FEMS Microbiol Ecol 2020;97:fiaa255.
CrossRef Google scholar
[37]
Liu Y, Qin Y, Chen T et al. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein Cell 2021;12:315–330.
CrossRef Google scholar
[38]
Liu YX, Chen L, Ma T et al. EasyAmplicon: an easy-to-use, open-source, reproducible, and community-based pipeline for amplicon data analysis in microbiome research. iMeta 2023;2:e83.
CrossRef Google scholar
[39]
Louca S, Parfrey LW, Doebeli M. Decoupling function and taxonomy in the global ocean microbiome. Science 2016;353:1272–1277.
CrossRef Google scholar
[40]
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15:1–21.
CrossRef Google scholar
[41]
McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 2013;8:e61217.
CrossRef Google scholar
[42]
Metcalf JL, Xu ZZ, Weiss S et al. Microbial community assembly and metabolic function during mammalian corpse decomposition. Science 2016;351:158–162.
CrossRef Google scholar
[43]
Nearing JT, Douglas GM, Hayes MG et al. Microbiome differential abundance methods produce different results across 38 data-sets. Nat Commun 2022;13:342.
CrossRef Google scholar
[44]
Nguyen NH, Song Z, Bates ST et al. FUNGuild: an open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecol 2016;20:241–248.
CrossRef Google scholar
[45]
Ning D, Yuan M, Wu L et al. A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nat Commun 2020;11:4717.
CrossRef Google scholar
[46]
Oksanen J, Kindt R, Legendre P et al. The vegan package. Community Ecol Package 2007;10:719.
[47]
Pages H, Aboyoun P, Gentleman R et al. Biostrings: string objects representing biological sequences, and matching algorithms. R Package Version 2016;2:10.18129.
[48]
Paoli L, Ruscheweyh H-J, Forneris CC et al. Biosynthetic potential of the global ocean microbiome. Nature 2022;607:111–118.
CrossRef Google scholar
[49]
Pasolli E, Schiffer L, Manghi P et al. Accessible, curated metagenomic data through ExperimentHub. Nat Methods 2017;14:1023–1024.
CrossRef Google scholar
[50]
Proctor LM, Creasy HH, Fettweis JM et al. The integrative human microbiome project. Nature 2019;569:641–648.
CrossRef Google scholar
[51]
Revelle W, Revelle MW. Package ‘psych’. The Compr R Archive Netw 2015;337:338.
[52]
Ripley B, Venables B, Bates DM et al. Package ‘mass’. Cran R 2013;538:113–120.
[53]
Robin X, Turck N, Hainard A et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf 2011;12:1–8.
CrossRef Google scholar
[54]
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2009;26:139–140.
CrossRef Google scholar
[55]
Rognes T, Flouri T, Nichols B et al. VSEARCH: a versatile open source tool for metagenomics. PeerJ 2016;4:e2584.
CrossRef Google scholar
[56]
Schloss PD, Westcott SL, Ryabin T et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009;75:7537–7541.
CrossRef Google scholar
[57]
Shenhav L, Thompson M, Joseph TA et al. FEAST: fast expectation-maximization for microbial source tracking. Nat Methods 2019;16:627–632.
CrossRef Google scholar
[58]
Si B, Liang Y, Zhao J et al. GGraph: an efficient structure-aware approach for iterative graph processing. IEEE Trans Big Data 2022;8:1182–1194.
CrossRef Google scholar
[59]
Stegen JC, Lin X, Fredrickson JK et al. Quantifying community assembly processes and identifying features that impose them. ISME J 2013;7:2069–2079.
CrossRef Google scholar
[60]
Thompson LR, Sanders JG, McDonald D et al; Earth Microbiome Project Consortium. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 2017;551:457–463.
CrossRef Google scholar
[61]
Truong DT, Franzosa EA, Tickle TL et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 2015;12:902–903.
CrossRef Google scholar
[62]
Wemheuer F, Taylor JA, Daniel R et al. Tax4Fun2: prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences. Environ Microbiome 2020;15:11.
CrossRef Google scholar
[63]
Wen T, Xie P, Yang S et al. ggClusterNet: an R package for microbiome network analysis and modularity-based multiple network layouts. iMeta 2022;1:e32.
CrossRef Google scholar
[64]
Wickham H. Reshaping data with the reshape package. J Stat Softw 2007;21:1–20.
CrossRef Google scholar
[65]
Wickham H. ggplot2. Wiley Interdiscip Rev Comput Stat 2011a;3:180–185.
CrossRef Google scholar
[66]
Wickham H. The split-apply-combine strategy for data analysis. J Stat Softw 2011b;40:1–29.
CrossRef Google scholar
[67]
Wirbel J, Zych K, Essex M et al. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol 2021;22:93.
CrossRef Google scholar
[68]
Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014;15:1–12.
CrossRef Google scholar
[69]
Xu S, Li L, Luo X et al. Ggtree: a serialized data object for visualization of a phylogenetic tree and annotation data. iMeta 2022;1:e56.
CrossRef Google scholar
[70]
Xu S, Zhan L, Tang W et al. MicrobiotaProcess: a comprehensive R package for deep mining microbiome. Innovation 2023;4:100388.
CrossRef Google scholar
[71]
Zhao Y, Federico A, Faits T et al. animalcules: interactive microbiome analytics and visualization in R. Microbiome 2021;9:1–16.
CrossRef Google scholar

RIGHTS & PERMISSIONS

2023 The Author(s) 2023. Published by Oxford University Press on behalf of Higher Education Press.
AI Summary AI Mindmap
PDF(2009 KB)

Accesses

Citations

Detail

Sections
Recommended

/