REVIEW

The best practice for microbiome analysis using R

  • Tao Wen 1,2 ,
  • Guoqing Niu 2 ,
  • Tong Chen 3 ,
  • Qirong Shen 2 ,
  • Jun Yuan , 2 ,
  • Yong-Xin Liu , 1
Expand
  • 1. Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
  • 2. The Key Laboratory of Plant Immunity Jiangsu Provincial Key Lab for Organic Solid Waste Utilization Jiangsu Collaborative Innovation Center for Solid Organic Waste Resource Utilization, National Engineering Research Center for Organic-based Fertilizers, Nanjing Agricultural University, Nanjing 210095, China
  • 3. National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
junyuan@njau.edu.cn
liuyongxin@caas.cn

Received date: 28 Feb 2023

Accepted date: 02 Apr 2023

Copyright

2023 The Author(s) 2023. Published by Oxford University Press on behalf of Higher Education Press.

Abstract

With the gradual maturity of sequencing technology, many microbiome studies have published, driving the emergence and advance of related analysis tools. R language is the widely used platform for microbiome data analysis for powerful functions. However, tens of thousands of R packages and numerous similar analysis tools have brought major challenges for many researchers to explore microbiome data. How to choose suitable, efficient, convenient, and easy-to-learn tools from the numerous R packages has become a problem for many microbiome researchers. We have organized 324 common R packages for microbiome analysis and classified them according to application categories (diversity, difference, biomarker, correlation and network, functional prediction, and others), which could help researchers quickly find relevant R packages for microbiome analysis. Furthermore, we systematically sorted the integrated R packages (phyloseq, microbiome, MicrobiomeAnalystR, Animalcules, microeco, and amplicon) for microbiome analysis, and summarized the advantages and limitations, which will help researchers choose the appropriate tools. Finally, we thoroughly reviewed the R packages for microbiome analysis, summarized most of the common analysis content in the microbiome, and formed the most suitable pipeline for microbiome analysis. This paper is accompanied by hundreds of examples with 10,000 lines codes in GitHub, which can help beginners to learn, also help analysts compare and test different tools. This paper systematically sorts the application of R in microbiome, providing an important theoretical basis and practical reference for the development of better microbiome tools in the future. All the code is available at GitHub github.com/taowenmicro/EasyMicrobiomeR.

Cite this article

Tao Wen , Guoqing Niu , Tong Chen , Qirong Shen , Jun Yuan , Yong-Xin Liu . The best practice for microbiome analysis using R[J]. Protein & Cell, 2023 , 14(10) : 713 -725 . DOI: 10.1093/procel/pwad024

1
Amir A, McDonald D, Navas-Molina JA et al. Deblur rapidly resolves single-nucleotide community sequence patterns. MSystems 2017;2:e00191–e00116.

DOI

2
Aßhauer KP, Wemheuer B, Daniel R et al. Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 2015;31:2882–2884.

DOI

3
Barnett DJ, Arts IC, Penders J. microViz: an R package for microbiome data visualization and statistics. J Open Source Softw 2021;6:3201.

DOI

4
Bolyen E, Rideout JR, Dillon MR et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019;37:852–857.

DOI

5
Callahan BJ, McMurdie PJ, Rosen MJ et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 2016;13:581–583.

DOI

6
Caporaso JG, Kuczynski J, Stombaugh J et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 2010;7:335–336.

DOI

7
Carrión VJ, Perez-Jaramillo J, Cordovez V et al. Pathogen-induced activation of disease-suppressive functions in the endophytic root microbiome. Science 2019;366:606–612.

DOI

8
Chen H, Boutros PC. VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinf 2011;12:1–7.

DOI

9
Chen T, Zhang H, Liu Y et al. EVenn: easy to create repeatable and editable Venn diagrams and Venn networks online. J Genet Genom 2021;48:863–866.

DOI

10
Chen Y, Li J, Zhang Y et al. Parallel-Meta Suite: interactive and rapid microbiome data analysis on multiple platforms. iMeta 2022;1:e1.

DOI

11
Chong J, Liu P, Zhou G et al. Using MicrobiomeAnalyst for comprehensive statistical, functional, and meta-analysis of microbiome data. Nat Protoc 2020;15:799–821.

DOI

12
Conway JR, Lex A, Gehlenborg NU. An R package for the visualization of intersecting sets and their properties. Bioinformatics 2017;33:2938–2940.

DOI

13
Dimitriadou E, Hornik K, Leisch F et al. Misc functions of the Department of Statistics (e1071), TU Wien. R Package 2008;1:5–24.

14
Dray S, Dufour A-B. The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 2007;22:1–20.

DOI

15
Dray S, Blanchet G, Borcard D et al. Package ‘adespatial’. R Package 2018;1:3–8.

16
Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010;26:2460–2461.

DOI

17
Edgar RC, Flyvbjerg H. Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics 2015;31:3476–3482.

DOI

18
Fisher RA. The use of multiple measurements in taxonomic problems. Ann Eugen 1936;7:179–188.

DOI

19
Franzosa EA, McIver LJ, Rahnavard G et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat Methods 2018;15:962–968.

DOI

20
Gu Z. Complex heatmap visualization. iMeta 2022;1:e43.

DOI

21
Gu Z, Gu L, Eils R et al. Circlize implements and enhances circular visualization in R. Bioinformatics 2014;30:2811–2812.

DOI

22
Hamilton NE, Ferry M. ggtern: Ternary diagrams using ggplot2. J Stat Softw 2018;87:1–17.

DOI

23
Harrell Jr FE, Harrell Jr MFE. Package ‘hmisc’. CRAN2018 2019;2019:235–236.

24
Hofner B, Mayr A, Robinzonov N et al. Model-based boosting in R: a hands-on tutorial using the R package mboost. Comput Stat 2014;29:3–35.

DOI

25
Huerta-Cepas J, Forslund K, Coelho LP et al. Fast genome-wide functional annotation through orthology assignment by egg-NOG-mapper. Mol Biol Evol 2017;34:2115–2122.

DOI

26
Huson DH, Auch AF, Qi J et al. MEGAN analysis of metagenomic data. Genome Res 2007;17:377–386.

DOI

27
Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat 1996;5:299–314.

DOI

28
Kembel SW, Cowan PD, Helmus MR et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 2010;26:1463–1464.

DOI

29
Knights D, Kuczynski J, Charlson ES et al. Bayesian community-wide culture-independent microbial source tracking. Nat Methods 2011;8:761–763.

DOI

30
Kuhn M. Building predictive models in R using the caret package. J Stat Softw 2008;28:1–26.

DOI

31
Kurtz ZD, Müller CL, Miraldi ER et al. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol 2015;11:e1004226.

DOI

32
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf 2008;9:1–13.

DOI

33
Li W, Wang L, Li X et al. Sequence-based functional metagenomics reveals novel natural diversity of functioning CopA in environmental microbiomes. Genom Proteom Bioinform 2022;20:1–12.

DOI

34
Liaw A, Wiener M. Classification and regression by randomForest. R News 2002;2:18–22.

35
Lin H, Peddada SD. Analysis of microbial compositions: a review of normalization and differential abundance analysis. Npj Biofilms Microbiomes 2020;6:1–13.

DOI

36
Liu C, Cui Y, Li X et al. microeco: an R package for data mining in microbial community ecology. FEMS Microbiol Ecol 2020;97:fiaa255.

DOI

37
Liu Y, Qin Y, Chen T et al. A practical guide to amplicon and metagenomic analysis of microbiome data. Protein Cell 2021;12:315–330.

DOI

38
Liu YX, Chen L, Ma T et al. EasyAmplicon: an easy-to-use, open-source, reproducible, and community-based pipeline for amplicon data analysis in microbiome research. iMeta 2023;2:e83.

DOI

39
Louca S, Parfrey LW, Doebeli M. Decoupling function and taxonomy in the global ocean microbiome. Science 2016;353:1272–1277.

DOI

40
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014;15:1–21.

DOI

41
McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One 2013;8:e61217.

DOI

42
Metcalf JL, Xu ZZ, Weiss S et al. Microbial community assembly and metabolic function during mammalian corpse decomposition. Science 2016;351:158–162.

DOI

43
Nearing JT, Douglas GM, Hayes MG et al. Microbiome differential abundance methods produce different results across 38 data-sets. Nat Commun 2022;13:342.

DOI

44
Nguyen NH, Song Z, Bates ST et al. FUNGuild: an open annotation tool for parsing fungal community datasets by ecological guild. Fungal Ecol 2016;20:241–248.

DOI

45
Ning D, Yuan M, Wu L et al. A quantitative framework reveals ecological drivers of grassland microbial community assembly in response to warming. Nat Commun 2020;11:4717.

DOI

46
Oksanen J, Kindt R, Legendre P et al. The vegan package. Community Ecol Package 2007;10:719.

47
Pages H, Aboyoun P, Gentleman R et al. Biostrings: string objects representing biological sequences, and matching algorithms. R Package Version 2016;2:10.18129.

48
Paoli L, Ruscheweyh H-J, Forneris CC et al. Biosynthetic potential of the global ocean microbiome. Nature 2022;607:111–118.

DOI

49
Pasolli E, Schiffer L, Manghi P et al. Accessible, curated metagenomic data through ExperimentHub. Nat Methods 2017;14:1023–1024.

DOI

50
Proctor LM, Creasy HH, Fettweis JM et al. The integrative human microbiome project. Nature 2019;569:641–648.

DOI

51
Revelle W, Revelle MW. Package ‘psych’. The Compr R Archive Netw 2015;337:338.

52
Ripley B, Venables B, Bates DM et al. Package ‘mass’. Cran R 2013;538:113–120.

53
Robin X, Turck N, Hainard A et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf 2011;12:1–8.

DOI

54
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2009;26:139–140.

DOI

55
Rognes T, Flouri T, Nichols B et al. VSEARCH: a versatile open source tool for metagenomics. PeerJ 2016;4:e2584.

DOI

56
Schloss PD, Westcott SL, Ryabin T et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 2009;75:7537–7541.

DOI

57
Shenhav L, Thompson M, Joseph TA et al. FEAST: fast expectation-maximization for microbial source tracking. Nat Methods 2019;16:627–632.

DOI

58
Si B, Liang Y, Zhao J et al. GGraph: an efficient structure-aware approach for iterative graph processing. IEEE Trans Big Data 2022;8:1182–1194.

DOI

59
Stegen JC, Lin X, Fredrickson JK et al. Quantifying community assembly processes and identifying features that impose them. ISME J 2013;7:2069–2079.

DOI

60
Thompson LR, Sanders JG, McDonald D et al; Earth Microbiome Project Consortium. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 2017;551:457–463.

DOI

61
Truong DT, Franzosa EA, Tickle TL et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat Methods 2015;12:902–903.

DOI

62
Wemheuer F, Taylor JA, Daniel R et al. Tax4Fun2: prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene sequences. Environ Microbiome 2020;15:11.

DOI

63
Wen T, Xie P, Yang S et al. ggClusterNet: an R package for microbiome network analysis and modularity-based multiple network layouts. iMeta 2022;1:e32.

DOI

64
Wickham H. Reshaping data with the reshape package. J Stat Softw 2007;21:1–20.

DOI

65
Wickham H. ggplot2. Wiley Interdiscip Rev Comput Stat 2011a;3:180–185.

DOI

66
Wickham H. The split-apply-combine strategy for data analysis. J Stat Softw 2011b;40:1–29.

DOI

67
Wirbel J, Zych K, Essex M et al. Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox. Genome Biol 2021;22:93.

DOI

68
Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 2014;15:1–12.

DOI

69
Xu S, Li L, Luo X et al. Ggtree: a serialized data object for visualization of a phylogenetic tree and annotation data. iMeta 2022;1:e56.

DOI

70
Xu S, Zhan L, Tang W et al. MicrobiotaProcess: a comprehensive R package for deep mining microbiome. Innovation 2023;4:100388.

DOI

71
Zhao Y, Federico A, Faits T et al. animalcules: interactive microbiome analytics and visualization in R. Microbiome 2021;9:1–16.

DOI

Outlines

/