The first high-quality genome assembly and annotation of Lantana camara, an important ornamental plant and a major invasive species
The first high-quality genome assembly and annotation of Lantana camara, an important ornamental plant and a major invasive species
This study presents the first annotated, haplotype-resolved, chromosome-scale genome of Lantana camara, a flowering shrub native to Central America and known for its dual role as an ornamental plant and an invasive species. Despite its widespread cultivation and ecological impact, the lack of a high-quality genome has hindered the investigation of traits of both ornamental and invasive. This research bridges the gap in genomic resources for L. camara, which is crucial for both ornamental breeding programs and invasive species management. Whole-genome and transcriptome sequencing were utilized to elucidate the genetic complexity of a diploid L. camara breeding line UF-T48. The genome was assembled de novo using HiFi and Hi-C reads, resulting in two phased genome assemblies with high Benchmarking Universal Single-Copy Orthologs (BUSCO) scores of 97.7%, indicating their quality. All 22 chromosomes were assembled with pseudochromosomes averaging 117 Mb. The assemblies revealed 29 telomeres and an extensive presence of repetitive sequences, primarily long terminal repeat transposable elements. The genome annotation identified 83,775 protein-coding genes, with 83% functionally annotated. In particular, the study mapped 42 anthocyanin and carotenoid candidate gene clusters and 12 herbicide target genes to the assembly, identifying 38 genes spread across the genome that are integral to flower color development and 53 genes for herbicide targeting in L. camara. This comprehensive genomic study not only enhances the understanding of L. camara’s genetic makeup but also sets a precedent for genomic research in the Verbenaceae family, offering a foundation for future studies in plant genetics, conservation, and breeding.
Lantana / Chromosome-length genome assembly / Hi-C
[1] | Anders S, Pyl PT, Huber W. HTSeq – a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9. https://doi.org/10.1093/BIOINFORMATICS/BTU638. |
[2] | Andrews S. Babraham Bioinformatics - FastQC A quality control tool for high throughput sequence data. 2010. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Accessed 7 Mar 2023. |
[3] | Arima Genomics. Arima-HiC mapping pipeline. 2019. https://github.com/ArimaGenomics/mapping_pipeline/tree/master. Accessed 7 Nov 2023. |
[4] | Avvaru AK, Sowpati DT, Mishra RK. PERF: an exhaustive algorithm for ultra-fast and efficient identification of microsatellites from large DNA sequences. Bioinformatics. 2018;34:943–8. https://doi.org/10.1093/BIOINFORMATICS/BTX721. |
[5] | Bhagwat SA, Breman E, Thekaekara T, Thornton TF, Willis KJ. A battle lost? Report on two centuries of invasion and management of Lantana camara L. in Australia, India and South Africa. PLoS One. 2012. https://doi.org/10.1371/journal.pone.0032407. |
[6] | Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20. https://doi.org/10.1093/BIOINFORMATICS/BTU170. |
[7] | Brown M, De la GonzálezRosa PM, Mark B. A Telomer Identification toolkit. 2023. Zenodo. https://doi.org/10.5281/zenodo.10091385. |
[8] | Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021;18:366–8. https://doi.org/10.1038/s41592-021-01101-x. |
[9] | Cabanettes F, Klopp C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ. 2018. https://doi.org/10.7717/PEERJ.4958. |
[10] | Cantalapiedra CP, Hern?andez-Plaza A, Letunic I, Bork P, Huerta-Cepas J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. MOL BIOL EVOL. 2021;38:5825–9. https://doi.org/10.1093/MOLBEV/MSAB293. |
[11] | Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18:170–5. https://doi.org/10.1038/s41592-020-01056-5. |
[12] | Czarnecki DM, Deng Z. Occurrence of unreduced female gametes leads to sexual polyploidization in lantana. J Am Soc Hortic Sci. 2009;134:560–6. https://doi.org/10.21273/JASHS.134.5.560. |
[13] | Czarnecki DM, Hershberger AJ, Robacker CD, Clark DG, Deng Z. Ploidy levels and pollen stainability of Lantana camara cultivars and breeding lines. HortScience. 2014;49:1271–6. https://doi.org/10.21273/HORTSCI.49.10.1271. |
[14] | DeMaere MZ, Darling AE. qc3C: Reference-free quality control for Hi-C sequencing data. PLoS Comput Biol. 2021. https://doi.org/10.1371/JOURNAL.PCBI.1008839. |
[15] | Dole?el J, Greilhuber J, Suda J. Estimation of nuclear DNA content in plants using flow cytometry. Nat Protoc. 2007;2:2233–44. https://doi.org/10.1038/nprot.2007.310. |
[16] | Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016;3:99–101. https://doi.org/10.1016/J.CELS.2015.07.012. |
[17] | Gabriel L, Br?na T, Hoff KJ, Ebel M, Lomsadze A, Borodovsky M, et al. BRAKER3: fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. bioRxiv.:2023.06.10.544449 [Preprint]. 2023 [cited 2024 Mar 5]: [21 p.]. Available from: https://doi.org/10.1101/2023.06.10.544449. |
[18] | Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5. https://doi.org/10.1093/BIOINFORMATICS/BTT086. |
[19] | Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38:576–89. https://doi.org/10.1016/J.MOLCEL.2010.05.004. |
[20] | Joshi AG, Praveen P, Ramakrishnan U, Sowdhamini R. Draft genome sequence of an invasive plant Lantana camara L. Bioinformation. 2022;18:739–41. https://doi.org/10.6026/97320630018739. |
[21] | Kersey PJ. Plant genome sequences: past, present, future. Curr Opin Plant Biol. 2019;48:1–8. https://doi.org/10.1016/J.PBI.2018.11.001. |
[22] | Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15. https://doi.org/10.1038/s41587-019-0201-4. |
[23] | Kokot M, Dlugosz M, Deorowicz S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics. 2017;33:2759–61. https://doi.org/10.1093/BIOINFORMATICS/BTX304. |
[24] | Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60. https://doi.org/10.1093/BIOINFORMATICS/BTP324. |
[25] | Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9. https://doi.org/10.1093/BIOINFORMATICS/BTP352. |
[26] | Macas J, Novak P, Pellicer J, Cizkova J, Koblizkova A, Neumann P, et al. In depth characterization of repetitive DNA in 23 plant genomes reveals sources of genome size variation in the legume tribe Fabeae. PLoS ONE. 2015. https://doi.org/10.1371/JOURNAL.PONE.0143424. |
[27] | Mehrotra S, Goyal V. Repetitive sequences in plant nuclear DNA: types, distribution, evolution and function. Genom Proteom Bioinform. 2014;12:164–71. https://doi.org/10.1016/J.GPB.2014.07.003. |
[28] | Ou S, Jiang N. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 2018;176:1410–22. https://doi.org/10.1104/PP.17.01310. |
[29] | Ou S, Chen J, Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 2018. https://doi.org/10.1093/NAR/GKY730. |
[30] | Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:1–18. https://doi.org/10.1186/S13059-019-1905-Y. |
[31] | Parrish SB, Qian R, Deng Z. Genome size and karyotype studies in five species of Lantana (Verbenaceae). HortScience. 2021;56:352–6. https://doi.org/10.21273/HORTSCI15603-20. |
[32] | Parrish SB, Paudel D, Deng Z. Transcriptome analysis of Lantana camara flower petals reveals candidate anthocyanin biosynthesis genes mediating red flower color development. G3-Genes Genom Genet. 2024. https://doi.org/10.1093/G3JOURNAL/JKAD259. |
[33] | Peng Z, Bhattarai K, Parajuli S, Cao Z, Deng Z. Transcriptome analysis of young ovaries reveals candidate genes involved in gamete formation in Lantana camara. Plants. 2019. https://doi.org/10.3390/PLANTS8080263. |
[34] | Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2. https://doi.org/10.1093/BIOINFORMATICS/BTQ033. |
[35] | R Core Team. R: A Language and Environment for Statistical Computing. 2023. https://www.R-project.org/. Accessed 7 Nov 2023. |
[36] | Ray A, Quader S. Genetic diversity and population structure of Lantana camara in India indicates multiple introductions and gene flow. Plant Biol. 2014;16:651–8. https://doi.org/10.1111/plb.12087. |
[37] | Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:1–27. https://doi.org/10.1186/S13059-020-02134-9. |
[38] | Servant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP, et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015;16:1–11. https://doi.org/10.1186/S13059-015-0831-X. |
[39] | Shackleton RT, Witt ABR, Aool W, Pratt CF. Distribution of the invasive alien weed, Lantana camara, and its ecological and livelihood impacts in eastern Africa. Afr J Range Forage Sci. 2017;34:1–11. https://doi.org/10.2989/10220119.2017.1301551. |
[40] | Shah S, Lonhienne T, Murray CE, Chen Y, Dougan KE, Low YS, et al. Genome-guided analysis of seven weed species reveals conserved sequence and structural features of key gene targets for herbicide development. Front Plant Sci. 2022. https://doi.org/10.3389/FPLS.2022.909073. |
[41] | Shah M, Alharby HF, Hakeem KR, Ali N, Rahman IU, Munawar M, et al. De novo transcriptome analysis of Lantana camara L. revealed candidate genes involved in phenylpropanoid biosynthesis pathway. Sci Rep. 2020. https://doi.org/10.1038/S41598-020-70635-5. |
[42] | Sharma GP, Raghubanshi AS, Singh JS. Lantana invasion: an overview. Weed Biol Manag. 2005;5:157–65. https://doi.org/10.1111/J.1445-6664.2005.00178.X. |
[43] | Sim?o FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2. https://doi.org/10.1093/BIOINFORMATICS/BTV351. |
[44] | Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2009. https://doi.org/10.1002/0471250953.BI0410S25. |
[45] | Taylor S, Kumar L, Reid N. Impacts of climate change and land-use on the potential distribution of an invasive weed: a case study of Lantana camara in Australia. Weed Res. 2012;52:391–401. https://doi.org/10.1111/J.1365-3180.2012.00930.X. |
[46] | Xu M, Guo L, Gu S, Wang O, Zhang R, Peters BA, et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. Gigascience. 2020;9:1–11. https://doi.org/10.1093/GIGASCIENCE/GIAA094. |
[47] | Yaradua SS, Shah M. The complete chloroplast genome of Lantana camara L. (Verbenaceae). Mitochondrial DNA Part B. 2020;5:918–9. https://doi.org/10.1080/23802359.2020.1719920. |
[48] | Zhou C, McCarthy SA, Durbin R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 2023. https://doi.org/10.1093/BIOINFORMATICS/BTAC808. |
/
〈 | 〉 |