MetaBIDx: a new computational approach to bacteria identification in microbiomes

Diem-Trang Pham , Vinhthuy Phan

Microbiome Research Reports ›› 2024, Vol. 3 ›› Issue (2) : 25

PDF
Microbiome Research Reports ›› 2024, Vol. 3 ›› Issue (2) :25 DOI: 10.20517/mrr.2024.01
Original Article

MetaBIDx: a new computational approach to bacteria identification in microbiomes

Author information +
History +
PDF

Abstract

Objectives: This study introduces MetaBIDx, a computational method designed to enhance species prediction in metagenomic environments. The method addresses the challenge of accurate species identification in complex microbiomes, which is due to the large number of generated reads and the ever-expanding number of bacterial genomes. Bacterial identification is essential for disease diagnosis and tracing outbreaks associated with microbial infections.

Methods: MetaBIDx utilizes a modified Bloom filter for efficient indexing of reference genomes and incorporates a novel strategy for reducing false positives by clustering species based on their genomic coverages by identified reads. The approach was evaluated and compared with several well-established tools across various datasets. Precision, recall, and F1-score were used to quantify the accuracy of species prediction.

Results: MetaBIDx demonstrated superior performance compared to other tools, especially in terms of precision and F1-score. The application of clustering based on approximate coverages significantly improved precision in species identification, effectively minimizing false positives. We further demonstrated that other methods can also benefit from our approach to removing false positives by clustering species based on approximate coverages.

Conclusion: With a novel approach to reducing false positives and the effective use of a modified Bloom filter to index species, MetaBIDx represents an advancement in metagenomic analysis. The findings suggest that the proposed approach could also benefit other metagenomic tools, indicating its potential for broader application in the field. The study lays the groundwork for future improvements in computational efficiency and the expansion of microbial databases.

Keywords

Bacteria identification / metagenomics / species identification / bloom filter / clustering

Cite this article

Download citation ▾
Diem-Trang Pham, Vinhthuy Phan. MetaBIDx: a new computational approach to bacteria identification in microbiomes. Microbiome Research Reports, 2024, 3(2): 25 DOI:10.20517/mrr.2024.01

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Voelkerding KV,Durtschi JD.Next-generation sequencing: from basic research to diagnostics.Clin Chem2009;55:641-58

[2]

National Research Council. The new science of metagenomics: revealing the secrets of our microbial planet. Washington, DC: The National Academies Press; 2007.

[3]

Martín R,Langella P.The role of metagenomics in understanding the human microbiome in health and disease.Virulence2014;5:413-23 PMCID:PMC3979869

[4]

Qin J,Cai Z.A metagenome-wide association study of gut microbiota in type 2 diabetes.Nature2012;490:55-60

[5]

Wang N,Chen L.Proteomics, metabolomics and metagenomics for type 2 diabetes and its complications.Life Sci2018;212:194-202

[6]

Valles-Colomer M,Darzi Y.The neuroactive potential of the human gut microbiota in quality of life and depression.Nat Microbiol2019;4:623-32

[7]

Lai J,Jiang J.Metagenomic analysis reveals gut bacterial signatures for diagnosis and treatment outcome prediction in bipolar depression.Psychiatry Res2022;307:114326

[8]

Kishikawa T,Nii T.'Comment on 'Metagenome-wide association study of gut microbiome revealed novel aetiology of rheumatoid arthritis in the Japanese population' by Kishikawa et al.' by Kitamura et al.Ann Rheum Dis2020;79:103-11

[9]

Chu Y,Huang Y.Metagenomic analysis revealed the potential role of gut microbiome in gout.NPJ Biofilms Microbiomes2021;7:66 PMCID:PMC8352958

[10]

Eloe-Fadrosh EA.The human microbiome: from symbiosis to pathogenesis.Annu Rev Med2013;64:145-63 PMCID:PMC3731629

[11]

Teeling H.Current opportunities and challenges in microbial metagenome analysis - a bioinformatic perspective.Brief Bioinform2012;13:728-42 PMCID:PMC3504927

[12]

Chiang A, Dekker JP. From the pipeline to the bedside: advances and challenges in clinical metagenomics.J Infect Dis2020;221:S331-40 PMCID:PMC7325616

[13]

Sharpton TJ.An introduction to the analysis of shotgun metagenomic data.Front Plant Sci2014;5:209 PMCID:PMC4059276

[14]

Piro VC,Renard BY.DUDes: a top-down taxonomic profiler for metagenomics.Bioinformatics2016;32:2272-80

[15]

Segata N,Ballarini A,Jousson O.Metagenomic microbial community profiling using unique clade-specific marker genes.Nat Methods2012;9:811-4 PMCID:PMC3443552

[16]

Tran Q,Phan V.Using 16S rRNA gene as marker to detect unknown bacteria in microbial communities.BMC Bioinformatics2017;18:499 PMCID:PMC5751639

[17]

Popic V,Snyder M.Fast metagenomic binning via hashing and bayesian clustering.J Comput Biol2018;25:677-88

[18]

Qian J.MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage.BMC Bioinformatics2019;20:367 PMCID:PMC6873667

[19]

Li D,Luo R,Lam TW.MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.Bioinformatics2015;31:1674-6

[20]

Nurk S,Korobeynikov A.metaSPAdes: a new versatile metagenomic assembler.Genome Res2017;27:824-34 PMCID:PMC5411777

[21]

Lindner MS.Metagenomic profiling of known and unknown microbes with microbeGPS.PLoS One2015;10:e0117711 PMCID:PMC4314203

[22]

Pham DT,Phan V.An accurate and fast alignment-free method for profiling microbial communities.J Bioinform Comput Biol2017;15:1740001

[23]

Müller A,Hildebrandt A,Schmidt B.MetaCache: context-aware classification of metagenomic reads using minhashing.Bioinformatics2017;33:3740-8

[24]

Ounit R,Close TJ.CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers.BMC Genomics2015;16:236 PMCID:PMC4428112

[25]

Wood DE.Kraken: ultrafast metagenomic sequence classification using exact alignments.Genome Biol2014;15:R46 PMCID:PMC4053813

[26]

Lindgreen S,Gardner PP.An evaluation of the accuracy and speed of metagenome analysis tools.Sci Rep2016;6:19233 PMCID:PMC4726098

[27]

Stranneheim H,Allander T,Arvestad L.Classification of DNA sequences using Bloom filters.Bioinformatics2010;26:1595-600 PMCID:PMC2887045

[28]

Srikakulam SK,Dabbaghie F,Kalinina OV.MetaProFi: an ultrafast chunked Bloom filter for storing and querying protein and nucleotide sequence data for accurate identification of functionally relevant genetic variants.Bioinformatics2023;39:btad101 PMCID:PMC9994790

[29]

Bradley P,Rocha EPC,Iqbal Z.Ultrafast search of all deposited bacterial and viral genomic data.Nat Biotechnol2019;37:152-9 PMCID:PMC6420049

[30]

Bingmann T,Gauger F.COBS: a compact bit-sliced signature index. In: Brisaboa N, Puglisi S, editors. String processing and information retrieval. Cham: Springer; 2019. pp. 285-303.

[31]

Lemane T,Chikhi R.kmtricks: efficient and flexible construction of Bloom filters for large sequencing data collections.Bioinform Adv2022;2:vbac029 PMCID:PMC9710589

[32]

Bloom BH.Space/time trade-offs in hash coding with allowable errors.Commun ACM1970;13:422-6

[33]

Pedregosa F,Gramfort A. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825-30. Available from: https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf?ref=https:/. [Last accessed on 28 March 2024]

[34]

Buitinck L,Blondel M. API design for machine learning software: experiences from the scikit-learn project. arXiv. [Preprint.] Sep 1, 2013 [accessed 2024 Mar 28]. Available from: https://arxiv.org/abs/1309.0238.

[35]

Mende DR,Sunagawa S.Assessment of metagenomic assembly using simulated next generation sequencing data.PLoS One2012;7:e31386 PMCID:PMC3285633

[36]

Sczyrba A,Belmann P.Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software.Nat Methods2017;14:1063-71 PMCID:PMC5903868

[37]

Ye SH,Park DJ.Benchmarking metagenomics tools for taxonomic classification.Cell2019;178:779-94 PMCID:PMC6716367

[38]

Salzberg SL,Kumar A.Next-generation sequencing in neuropathologic diagnosis of infections of the nervous system.Neurol Neuroimmunol Neuroinflamm2016;3:e251 PMCID:PMC4907805

[39]

Wood DE,Langmead B.Improved metagenomic analysis with Kraken 2.Genome Biol2019;20:257 PMCID:PMC6883579

[40]

Breitwieser FP,Salzberg SL.KrakenUniq: confident and fast metagenomics classification using unique k-mer counts.Genome Biol2018;19:198 PMCID:PMC6238331

[41]

Kim D,Breitwieser FP.Centrifuge: rapid and sensitive classification of metagenomic sequences.Genome Res2016;26:1721-9 PMCID:PMC5131823

[42]

Pierce NT,Reiter T,Brown CT.Large-scale sequence comparisons with sourmash.F1000Res2019;8:1006 PMCID:PMC6720031

[43]

Koslicki D.MetaPalette: a k-mer painting approach for metagenomic taxonomic profiling and quantification of novel strain variation.mSystems2016;1:e00020-16 PMCID:PMC5069763

AI Summary AI Mindmap
PDF

80

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/