JCVI: A versatile toolkit for comparative genomics analysis

Haibao Tang , Vivek Krishnakumar , Xiaofei Zeng , Zhougeng Xu , Adam Taranto , Johnathan S. Lomas , Yixing Zhang , Yumin Huang , Yibin Wang , Won Cheol Yim , Jisen Zhang , Xingtan Zhang

iMeta ›› 2024, Vol. 3 ›› Issue (4) : e211

PDF
iMeta ›› 2024, Vol. 3 ›› Issue (4) :e211 DOI: 10.1002/imt2.211
METHOD
JCVI: A versatile toolkit for comparative genomics analysis
Author information +
History +
PDF

Abstract

The life cycle of genome builds spans interlocking pillars of assembly, annotation, and comparative genomics to drive biological insights. While tools exist to address each pillar separately, there is a growing need for tools to integrate different pillars of a genome project holistically. For example, comparative approaches can provide quality control of assembly or annotation; genome assembly, in turn, can help to identify artifacts that may complicate the interpretation of genome comparisons. The JCVI library is a versatile Python-based library that offers a suite of tools that excel across these pillars. Featuring a modular design, the JCVI library provides high-level utilities for tasks such as format parsing, graphics generation, and manipulation of genome assemblies and annotations. Supporting genomics algorithms like MCscan and ALLMAPS are widely employed in building genome releases, producing publication-ready figures for quality assessment and evolutionary inference. Developed and maintained collaboratively, the JCVI library emphasizes quality and reusability.

Keywords

comparative genomics / genome annotation / genome assembly / genomic data / visualization

Cite this article

Download citation ▾
Haibao Tang, Vivek Krishnakumar, Xiaofei Zeng, Zhougeng Xu, Adam Taranto, Johnathan S. Lomas, Yixing Zhang, Yumin Huang, Yibin Wang, Won Cheol Yim, Jisen Zhang, Xingtan Zhang. JCVI: A versatile toolkit for comparative genomics analysis. iMeta, 2024, 3(4): e211 DOI:10.1002/imt2.211

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Ye, Ning, Xuelin Wang, Juan Li, Changwei Bi, Yiqing Xu, Dongyang Wu, and Qiaolin Ye. 2017. “Assembly and Comparative Analysis of Complete Mitochondrial Genome Sequence of an Economic Plant Salix suchowensis.” PeerJ 5: e3148. https://doi.org/10.7717/peerj.3148

[2]

Lugli, Gabriele Andrea. 2021. “Assembly, Annotation, and Comparative Analysis of Bifidobacterial Genomes.” Methods Molecular Biology 2278: 31-44. https://doi.org/10.1007/978-1-0716-1274-3_4

[3]

Pop, M. 2004. “Comparative Genome Assembly.” Briefings in Bioinformatics 5: 237-248. https://doi.org/10.1093/bib/5.3.237

[4]

Van Bel, Michiel, Sebastian Proost, Elisabeth Wischnitzki, Sara Movahedi, Christopher Scheerlinck, Yves Van de Peer, and Klaas Vandepoele. 2012. “Dissecting Plant Genomes With the PLAZA Comparative Genomics Platform.” Plant Physiology 158: 590-600. https://doi.org/10.1104/pp.111.189514

[5]

Schuldt, Andreas, Anne Ebeling, Matthias Kunz, Michael Staab, Claudia Guimarães-Steinicke, Dörte Bachmann, Nina Buchmann, et al. 2019. “Multiple Plant Diversity Components Drive Consumer Communities Across Ecosystems.” Nature Communications 10: 1460. https://doi.org/10.1038/s41467-019-09448-8

[6]

Ebert, Andreas W., and Johannes M. M. Engels. 2020. “Plant Biodiversity and Genetic Resources Matter!” Plants 9: 1706. https://doi.org/10.3390/plants9121706

[7]

Armstrong, Joel. “Enabling Comparative Genomics at the Scale of Hundreds of Species” Doctoral diss., University of California Santa Cruz, Genomics Institute, California, USA. 2019. 145. PhD. eScholarship. 7pv8w2bz. English. https://escholarship.org/uc/item/7pv8w2bz

[8]

Chen, Chengjie, Ya Wu, Jiawei Li, Xiao Wang, Zaohai Zeng, Jing Xu, Yuanlong Liu, et al. 2023. “TBtools-II: A “One for All, All for One” Bioinformatics Platform for Biological Big-Data Mining.” Molecular Plant 16: 1733-1742. https://doi.org/10.1016/j.molp.2023.09.010

[9]

Harris, Charles R., K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, et al. 2020. “Array Programming With NumPy.” Nature 585: 357-362. https://doi.org/10.1038/s41586-020-2649-2

[10]

Rand, Knut, Ivar Grytten, Milena Pavlovic, Chakravarthi Kanduri, and Geir Kjetil Sandve. 2022. “BioNumPy: Fast and Easy Analysis of Biological Data With Python.” bioRxiv 2022.2012.2021.521373. https://doi.org/10.1101/2022.12.21.521373

[11]

Cock, Peter J. A., Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, et al. 2009. “Biopython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics.” Bioinformatics 25: 1422-1423. https://doi.org/10.1093/bioinformatics/btp163

[12]

Pop, M. 2009. “Genome Assembly Reborn: Recent Computational Challenges.” Briefings in Bioinformatics 10: 354-366. https://doi.org/10.1093/bib/bbp026

[13]

Liu, Dang, Martin Hunt, and Isheng J. Tsai. 2018. “Inferring Synteny Between Genome Assemblies: A Systematic Evaluation.” BMC Bioinformatics 19: 26. https://doi.org/10.1186/s12859-018-2026-4

[14]

Nusrat, S., T. Harbig, and N. Gehlenborg. 2019. “Tasks, Techniques, and Tools for Genomic Data Visualization.” Computer Graphics Forum 38: 781-805. https://doi.org/10.1111/cgf.13727

[15]

Sun, Pengchuan, Beibei Jiao, Yongzhi Yang, Lanxing Shan, Ting Li, Xiaonan Li, Zhenxiang Xi, Xiyin Wang andJianquan Liu. 2022. “WGDI: A User-Friendly Toolkit for Evolutionary Analyses of Whole-Genome Duplications and Ancestral Karyotypes.” Molecular Plant 15: 1841-1851. https://doi.org/10.1016/j.molp.2022.10.018

[16]

Patel, Ravi K., and Mukesh Jain. 2012. “NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data.” PLoS One 7: e30619. https://doi.org/10.1371/journal.pone.0030619

[17]

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. 1990. “Basic Local Alignment Search Tool.” Journal of Molecular Biology 215: 403-410. https://doi.org/10.1016/S0022-2836(05)80360-2

[18]

Kiełbasa, Szymon M., Raymond Wan, Kengo Sato, Paul Horton, and Martin C. Frith. 2011. “Adaptive Seeds Tame Genomic Sequence Comparison.” Genome Research 21: 487-493. https://doi.org/10.1101/gr.113985.110

[19]

Behnel, Stefan, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. 2011. “Cython: the Best of Both Worlds.” Computing in Science & Engineering 13: 31-39. https://doi.org/10.1109/MCSE.2010.118

[20]

Tang, Haibao, John E. Bowers, Xiyin Wang, Ray Ming, Maqsudul Alam, and Andrew H. Paterson. 2008. “Synteny and Collinearity in Plant Genomes.” Science 320: 486-488. https://doi.org/10.1126/science.1153917

[21]

Chalhoub, Boulos, France Denoeud, Shengyi Liu, Isobel A. P. Parkin, Haibao Tang, Xiyin Wang, Julien Chiquet, et al. 2014. “Plant Genetics. Early Allopolyploid Evolution in the Post-Neolithic Brassica Napusoilseed Genome.” Science 345: 950-953. https://doi.org/10.1126/science.1253435

[22]

Tang, Haibao, Eric Lyons, Brent Pedersen, James C. Schnable, Andrew H. Paterson, and Michael Freeling. 2011. “Screening Synteny Blocks in Pairwise Genome Comparisons Through Integer Programming.” BMC Bioinformatics 12: 102. https://doi.org/10.1186/1471-2105-12-102

[23]

Purcell, Shaun, Benjamin Neale, Kathe Todd-Brown, Lori Thomas, Manuel A. R. Ferreira, David Bender, Julian Maller, et al. 2007. “PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses.” The American Journal of Human Genetics 81: 559-575. https://doi.org/10.1086/519795

[24]

Tang, Haibao, Xingtan Zhang, Chenyong Miao, Jisen Zhang, Ray Ming, James C. Schnable, Patrick S. Schnable, Eric Lyons, and Jianguo Lu. 2015. “ALLMAPS: Robust Scaffold Ordering Based on Multiple Maps.” Genome Biology 16: 3. https://doi.org/10.1186/s13059-014-0573-1

[25]

Campbell, Michael S., Carson Holt, Barry Moore, and Mark Yandell. 2014. “Genome Annotation and Curation Using MAKER and MAKER-P.” Current Protocols in Bioinformatics 48: 4 11 11-14 11 39. https://doi.org/10.1002/0471250953.bi0411s48

[26]

Young, Nevin D., Frédéric Debellé, Giles E. D. Oldroyd, Rene Geurts, Steven B. Cannon, Michael K. Udvardi, Vagner A. Benedito, et al. 2011. “The Medicago Genome Provides Insight Into the Evolution of Rhizobial Symbioses.” Nature 480: 520-524. https://doi.org/10.1038/nature10625

[27]

Goodstein, David M., Shengqiang Shu, Russell Howson, Rochak Neupane, Richard D. Hayes, Joni Fazo, Therese Mitros, et al. 2012. “Phytozome: A Comparative Platform for Green Plant Genomics.” Nucleic Acids Research 40: D1178-D1186. https://doi.org/10.1093/nar/gkr944

[28]

Wang, Y., H. Tang, J. D. Debarry, X. Tan, J. Li, X. Wang, T.-h. Lee, et al. 2012. “MCScanX: A Toolkit for Detection and Evolutionary Analysis of Gene Synteny and Collinearity.” Nucleic Acids Research 40: e49. https://doi.org/10.1093/nar/gkr1293

[29]

Bowers, John E., Brad A. Chapman, Junkang Rong, and Andrew H. Paterson. 2003. “Unravelling Angiosperm Genome Evolution by Phylogenetic Analysis of Chromosomal Duplication Events.” Nature 422: 433-438. https://doi.org/10.1038/nature01521

[30]

Jiao, Yuannian, Jim Leebens-Mack, Saravanaraj Ayyampalayam, John E. Bowers, Michael R. McKain, Joel McNeal, Megan Rolf, et al. 2012. “A Genome Triplication Associated With Early Diversification of the Core Eudicots.” Genome Biology 13: R3. https://doi.org/10.1186/gb-2012-13-1-r3

[31]

Ranallo-Benavidez, T. Rhyker, Kamil S. Jaron, and Michael C. Schatz. 2020. “GenomeScope 2.0 and Smudgeplot for Reference-Free Profiling of Polyploid Genomes.” Nature Communications 11: 1432. https://doi.org/10.1038/s41467-020-14998-3

[32]

Soderlund, Carol, William Nelson, Austin Shoemaker, and Andrew Paterson. 2006. “SyMAP: A System for Discovering and Viewing Syntenic Regions of FPC Maps.” Genome Research 16: 1159-1168. https://doi.org/10.1101/gr.5396706

[33]

Cannon, Ethalinda K. S., and Steven B. Cannon. 2011. “Chromosome Visualization Tool: A Whole Genome Viewer.” International Journal of Plant Genomics 2011: 1-4. https://doi.org/10.1155/2011/373875

[34]

Xu, Yiqing, Qi'ang Wang, Luis Tanon Reyes, Feng Cheng, Changwei Bi, Ning Ye, and Guoxin Wu. 2020. “VGSC2: Second Generation Vector Graph Toolkit of Genome Synteny and Collinearity.” Genomics 112: 286-288. https://doi.org/10.1016/j.ygeno.2019.02.007

[35]

Ochoa, Alejandro, John D. Storey, Manuel Llinás, and Mona Singh. 2015. “Beyond the E-Value: Stratified Statistics for Protein Domain Prediction.” PLoS Computational Biology 11: e1004509. https://doi.org/10.1371/journal.pcbi.1004509

[36]

Zhang, Xingtan, Ruoxi Wu, Yibin Wang, Jiaxin Yu, and Haibao Tang. 2020. “Unzipping Haplotypes in Diploid and Polyploid Genomes.” Computational and Structural Biotechnology Journal 18: 66-72. https://doi.org/10.1016/j.csbj.2019.11.011

[37]

Wang, Yibin, Jiaxin Yu, Mengwei Jiang, Wenlong Lei, Xingtan Zhang, and Haibao Tang. 2023. “Sequencing and Assembly of Polyploid Genomes.” Methods in Molecular Biolology 2545: 429-458. https://doi.org/10.1007/978-1-0716-2561-3_23

[38]

Broman, Karl W., Hao Wu, Śaunak Sen, and Gary A. Churchill. 2003. “R/Qtl: QTL Mapping in Experimental Crosses.” Bioinformatics 19: 889-890. https://doi.org/10.1093/bioinformatics/btg112

[39]

Wu, Yonghui, Prasanna R. Bhat, Timothy J. Close, and Stefano Lonardi. 2008. “Efficient and Accurate Construction of Genetic Linkage Maps From the Minimum Spanning Tree of a Graph.” PLoS Genetics 4: e1000212. https://doi.org/10.1371/journal.pgen.1000212

[40]

Van Ooijen, J. W. 2011. “Multipoint Maximum Likelihood Mapping in a Full-Sib Family of an Outbreeding Species.” Genetics Research 93: 343-349. https://doi.org/10.1017/S0016672311000279

[41]

Han, Limin, and Graham Kendall. 2003. “Guided Operators for a Hyper-Heuristic Genetic Algorithm.” AI 2003: Advances in Artificial Intelligence 2903: 807-820. https://doi.org/10.1007/978-3-540-24581-0_69

[42]

Zhang, Xingtan, Shengcheng Zhang, Qian Zhao, Ray Ming, and Haibao Tang. 2019. “Assembly of Allele-Aware, Chromosomal-Scale Autopolyploid Genomes Based on Hi-C Data.” Nature Plants 5: 833-845. https://doi.org/10.1038/s41477-019-0487-8

[43]

Alonge, Michael, Ludivine Lebeigle, Melanie Kirsche, Katie Jenike, Shujun Ou, Sergey Aganezov, Xingang Wang, et al. 2022. “Automated Assembly Scaffolding Using RagTag Elevates a New Tomato System for High-Throughput Genome Editing.” Genome Biology 23: 258. https://doi.org/10.1186/s13059-022-02823-7

[44]

Huber, Wolfgang, Vincent J. Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S. Carvalho, Hector Corrada Bravo, et al. 2015. “Orchestrating High-Throughput Genomic Analysis With Bioconductor.” Nature Methods 12: 115-121. https://doi.org/10.1038/nmeth.3252

[45]

Yachdav, Guy, Tatyana Goldberg, Sebastian Wilzbach, David Dao, Iris Shih, Saket Choudhary, Steve Crouch, et al. 2015. “Anatomy of BioJS, An Open Source Community for the Life Sciences.” Elife 4: e07009. https://doi.org/10.7554/eLife.07009

[46]

Gnerre, Sante, Iain Maccallum, Dariusz Przybylski, Filipe J. Ribeiro, Joshua N. Burton, Bruce J. Walker, Ted Sharpe, et al. 2011. “High-Quality Draft Assemblies of Mammalian Genomes From Massively Parallel Sequence Data.” Proceedings of the National Academy of Sciences of the United States of America 108: 1513-1518. https://doi.org/10.1073/pnas.1017351108

[47]

Pedersen, Brent S., and Aaron R. Quinlan. 2018. “Mosdepth: Quick Coverage Calculation for Genomes and Exomes.” Bioinformatics 34: 867-868. https://doi.org/10.1093/bioinformatics/btx699

RIGHTS & PERMISSIONS

2024 The Authors. iMeta published by John Wiley & Sons Australia, Ltd on behalf of iMeta Science.

PDF

0

Accesses

0

Citation

Detail

Sections
Recommended

/