The nature of complex structural variations in tomatoes

Xue Cui , Yuxin Liu , Miao Sun , Qiyue Zhao , Yicheng Huang , Jianwei Zhang , Qiulin Yao , Hang Yin , Huixin Zhang , Fulei Mo , Hongbin Zhong , Yang Liu , Xiuling Chen , Yao Zhang , Jiayin Liu , Youwen Qiu , Mingfang Feng , Xu Chen , Hossein Ghanizadeh , Yao Zhou , Aoxue Wang

Horticulture Research ›› 2025, Vol. 12 ›› Issue (7) : 107

PDF (2180KB)
Horticulture Research ›› 2025, Vol. 12 ›› Issue (7) :107 DOI: 10.1093/hr/uhaf107
Articles
research-article
The nature of complex structural variations in tomatoes
Author information +
History +
PDF (2180KB)

Abstract

Structural variations (SVs) in repetitive sequences could only be detected within a broad region due to imprecise breakpoints, leading to classification errors and inaccurate trait analysis. Through manual inspection at 4532 variant regions identified by integrating 14 detection pipelines between two tomato genomes, we generated an SV benchmark at base-pair resolution. Evaluation of all pipelines yielded F1-scores below 53.77% with this benchmark, underscoring the urgent need for advanced detection algorithms in plant genomics. Analyzing the alignment features of the repetitive sequences in each region, we summarized four patterns of SV breakpoints and revealed that deviations in breakpoint identification were primarily due to copy misalignment. According to the similarities among copies, we identified 1635 bona fide SVs with precise breakpoints, including substitutions (223), which should be taken as a fundamental SV type, alongside insertions (780), deletions (619), and inversions (13), all showing preferences for SV occurrence within AT-repeat regions of regulatory loci. This precise resolution of complex SVs will foster genome analysis and crop improvement.

Cite this article

Download citation ▾
Xue Cui, Yuxin Liu, Miao Sun, Qiyue Zhao, Yicheng Huang, Jianwei Zhang, Qiulin Yao, Hang Yin, Huixin Zhang, Fulei Mo, Hongbin Zhong, Yang Liu, Xiuling Chen, Yao Zhang, Jiayin Liu, Youwen Qiu, Mingfang Feng, Xu Chen, Hossein Ghanizadeh, Yao Zhou, Aoxue Wang. The nature of complex structural variations in tomatoes. Horticulture Research, 2025, 12(7): 107 DOI:10.1093/hr/uhaf107

登录浏览全文

4963

注册一个新账户 忘记密码

Acknowledgments

This work was partially supported by grants from the National Natural Science Foundation of China (grant no. U22A20495 and grant no. 32072588 to A.W.), the Intelligent Molecular Breeding project of Department of Agriculture and Rural Affairs of Heilongjiang Province (to A.W.), the Science & Technology Specific Projects in Agricultural High-tech Industrial Demonstration Area of the Yellow River Delta (grant no. 2022SZX13 to Y.Z.), and the Strategic Priority Research Program of Chinese Academy of Sciences (grant no. XDA26030102 to Y.Z.). We also thank Zhigui Bao from Max Planck Institute for Biology Tübingen to improving the quality of the manuscript.

Author contributions

A.W. and Y.Z. designed the study and experiments. X.C., M.S., and Q.Z. performed data analyses and prepared the figures. Y.H. and J.Z. assisted with genome assembly. Yuxin.L. collected samples. Q.Y., H.Y., H.Z., F.M., H.Zhong, Y.L., Xiuling.C., Y.Zhang, J.L., Y.Q., M.F., and X.Chen participated in discussions. X.C. wrote the manuscript and Y.Z., H.G., Yuxin.L., and A.W. revised the manuscript.

Data availability

The sequencing data and assembly genome generated in this study are publicly available in the Sequence Read Archive (https://ncbi.nlm.nih.gov/sra) under BioProject PRJNA1135477. All SV datasets needed to reproduce the results of this study are available in the Article and Supplementary Data. All code used in the manuscript is publicly available at GitHub (https://github.com/xuecui1997/SVs_inspection).

Conflict of interest statement

The authors declare no conflicts of interest.

Supplementary data

Supplementary data is available at Horticulture Research online.

References

[1]

Scott AJ, Chiang C, Hall IM. Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes. Genome Res. 2021; 31:2249-57

[2]

Spielmann M, Lupiáñez DG, Mundlos S. Structural variation in the 3D genome. Nat. Rev. Genet. 2018; 19:453-67

[3]

Butelli E, Licciardello C, Zhang Y. et al. Retrotransposons control fruit-specific, cold-dependent accumulation of anthocyanins in blood oranges. Plant Cell. 2012; 24:1242-55

[4]

Kang M, Wu H, Liu H. et al. The pan-genome and local adaptation of Arabidopsis thaliana. Nat Commun. 2023; 14:6259

[5]

Alonge M, Wang X, Benoit M. et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell. 2020; 182:145-161.e23

[6]

Chakraborty M, Emerson JJ, Macdonald SJ. et al. Structural vari-ants exhibit widespread allelic heterogeneity and shape varia-tion in complex traits. Nat Commun. 2019; 10:4872

[7]

Ahsan MU, Liu Q, Perdomo JE. et al. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data. Nat Methods. 2023; 20:1143-58

[8]

Liu YH, Luo C, Golding SG. et al. Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nat Commun. 2024; 15:2447

[9]

Wala JA, Bandopadhayay P, Greenwald NF. et al. Svaba: genome-wide detection of structural variants and indels by local assem-bly. Genome Res. 2018; 28:581-91

[10]

Ho SS, Urban AE, Mills RE. Structural variation in the sequencing era. Nat. Rev. Genet. 2020; 21:171-89

[11]

Sedlazeck FJ, Rescheneder P, Smolka M. et al. Accurate detec-tion of complex structural variations using single-molecule sequencing. Nat Methods. 2018; 15:461-8

[12]

Tian C, Li D, Liu P. et al. A de novo complex chromosome rear-rangement associated with multisystematic abnormalities, a case report. Mol Cytogenet. 2017; 10:32

[13]

Audano PA, Beck CR. Small polymorphisms are a source of ancestral bias in structural variant breakpoint placement. Genome Res. 2024; 34:7-19

[14]

Jiang T, Liu Y, Jiang Y. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 2020; 21:189

[15]

Khayat MM, Sahraeian SME, Zarate S. et al. Hidden biases in germline structural variant detection. Genome Biol. 2021; 22:347

[16]

Alkan C, Coe BP, Eichler EE. Genome structural variation discov-ery and genotyping. Nat. Rev. Genet. 2011; 12:363376

[17]

Heller D, Vingron M. SVIM: structural variant identification using mapped long reads. Bioinformatics. 2019; 35:2907-15

[18]

Hastings PJ, Lupski JR, Rosenberg SM. et al. Mechanisms of change in gene copy number. Nat Rev Genet. 2009; 10:551-64

[19]

Olson ND, Wagner J, Dwarshuis N. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet. 2023; 24:464-83

[20]

Mahmoud M, Gobet N, Cruz-Dávalos DI. et al. Structural variant calling: the long and the short of it. Genome Biol. 2019; 20:246

[21]

Kong W, Wang Y, Zhang S. et al. Recent advances in assembly of plant complex genomes. Genom Proteom Bioinform. 2023; 21: 427-39

[22]

Wang P, Moore BM, Panchy NL. et al. Factors influencing gene family size variation among related species in a plant family. Solanaceae Genome Biol Evol. 2018; 10:2596-613

[23]

Hosmani PS. et al. An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequenc-ing, hi-C proximity ligation and optical maps. Preprint at bioRxiv. 2019; 767764. https://doi.org/10.1101/767764

[24]

Li KP, Xu P, Wang JP. et al. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nat Commun. 2023; 14: 6556

[25]

Belyeu JR, Chowdhury M, Brown J. et al. Samplot: a platform for structural variant visual validation and automated filtering. Genome Biol. 2021; 22:161

[26]

Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013; 14:178-92

[27]

Hadi K, Yao X, Behr JM. et al. Distinct classes of complex struc-tural variation uncovered across thousands of cancer genome graphs. Cell. 2020; 183:197-210.e32

[28]

Porubsky D, Höps W, Ashraf H. et al. Recurrent inversion poly-morphisms in humans associate with genetic instability and genomic disorders. Cell. 2022; 185:1986-2005.e26

[29]

Liu Z, Xie Z, Li M. Comprehensive and deep evaluation of structural variation detection pipelines with third-generation sequencing data. Genome Biol. 2024; 25:188

[30]

Gao L, Gonda I, Sun H. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat Genet. 2019; 51:1044-51

[31]

Pereira L, Sapkota M, Alonge M. et al. Natural genetic diversity in tomato flavor genes. Front Plant Sci. 2021; 12:642828

[32]

Zhou Y, Zhang Z, Bao Z. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature. 2022; 606:527-34

[33]

Travers KJ, Chin CS, Rank DR. et al. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 2010; 38:e159

[34]

Nurk S, Walenz BP, Rhie A. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020; 30:1291-305

[35]

Kolmogorov M, Yuan J, Lin Y. et al. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol. 2019; 37:540-6

[36]

Cheng H, Concepcion GT, Feng X. et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021; 18:170-5

[37]

Alonge M, Soyk S, Ramakrishnan S. et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 2019; 20:224

[38]

Zhang J, Kudrna D, Mu T. et al. Genome puzzle master (GPM): an integrated pipeline for building and editing pseudomolecules from fragmented sequences. Bioinformatics. 2016; 32:3058-64

[39]

Simão FA, Waterhouse RM, Ioannidis P. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015; 31:3210-2

[40]

Rhie A, Walenz BP, Koren S. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020; 21:245

[41]

Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999; 27:573-80

[42]

Ou S, Su W, Liao Y. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehen-sive pipeline. Genome Biol. 2019; 20:275

[43]

Holt C, Yandell M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics. 2011; 12:491

[44]

Kim D, Paggi JM, Park C. et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019; 37:907-15

[45]

Kovaka S, Zimin AV, Pertea GM. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019; 20:278

[46]

Korf I. Gene finding in novel genomes. BMC Bioinformatics. 2004; 5:59

[47]

Stanke M, Keller O, Gunduz I. et al. AUGUSTUS: ab initio predic-tion of alternative transcripts. Nucleic Acids Res. 2006;34:W435-9

[48]

Lin J, Wang S, Audano PA. et al. SVision: a deep learning approach to resolve complex structural variants. Nat Methods. 2022; 19: 1230-3

[49]

Harris RS. Improved Pairwise Alignment of Genomic DNA. Pennsyl-vania: The Pennsylvania State University, 2007

[50]

Marçais G, Delcher AL, Phillippy AM. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol. 2018; 14: e1005944

[51]

Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018; 34:3094-100

[52]

Goel M, Sun H, Jiao W-B. et al. SyRI: finding genomic rear-rangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019; 20:277

[53]

Song B, Marco-Sola S, Moreto M. et al. AnchorWave: sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication. Proc Natl Acad Sci USA. 2022; 119:e2113075119

[54]

Nattestad M, Schatz MC. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics. 2016; 32:3021-3

[55]

Li H, Handsaker B, Wysoker A. et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009; 25:2078-9

[56]

Racine J. Gnuplot 4.0: a portable interactive plotting utility. J Appl Econ. 2006; 21:133-41

[57]

Camacho C, Coulouris G, Avagyan V. et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009; 10:421

[58]

Danecek P, Bonfield JK, Liddle J. et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10:giab008

[59]

Li H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics. 2011; 27:718-9

[60]

English AC, Menon VK, Gibbs RA. et al. Truvari: refined struc-tural variant comparison preserves allelic diversity. Genome Biol. 2022; 23:271

[61]

Bailey TL, Boden M, Buske FA. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37:W202-8

[62]

Cingolani P, Platts A, Wang LL. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012; 6:80-92

[63]

Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841-2

[64]

Ashburner M, Ball CA, Blake JA. et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000; 25:25-9

[65]

Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000; 28:27-30

PDF (2180KB)

235

Accesses

0

Citation

Detail

Sections
Recommended

/