Graph pan-genome advances genetic discoveries and the improvement of eggplant

Chuying Yu , Weiliu Li , Yaqin Jiang , Qihong Yang , Guiyun Gan , Liangyu Cai , Wenjia Li , Yikui Wang

Horticulture Research ›› 2026, Vol. 13 ›› Issue (1) : 248

PDF (5874KB)
Horticulture Research ›› 2026, Vol. 13 ›› Issue (1) :248 DOI: 10.1093/hr/uhaf248
Articles
research-article
Graph pan-genome advances genetic discoveries and the improvement of eggplant
Author information +
History +
PDF (5874KB)

Abstract

Eggplant is one of the most important solanaceous vegetable crops worldwide. To explore its genomic diversity, we assembled two T2T-level reference genomes from the African eggplant ‘Y11’ (Solanum aethiopicum L.) and the cultivated variety ‘Gui5’ (Solanum melongena L.) with genome sizes of 1.10 and 1.13 Gb, respectively. The contigs N50 lengths are 94.2 and 93.9 Mb, with annotations of 37 324 and 40 300 protein-coding genes correspondingly. We also sequenced 238 germplasms, primarily local and cultivated varieties from China, Southeast Asia, Europe, and Africa, identifying 7 853 531 high-quality single nucleotide polymorphisms. Phylogenetic trees and population structures suggest that the domestication of Chinese eggplants occurred later than in Southeast Asia and subsequently diverged into northern and southern groups within China, evolving relatively independently with limited genetic flow between these two groups. Their diversity is significantly lower than that of Southeast Asia and Europe. By selecting 22 representative accessions and four chromosome-level genomes, we constructed an Asian-representative eggplant pan-genome, assembling 463.94 Mb of nonreference sequences. Of these sequences, 38.3% are core genes, 46.9% are dispensable genes, and 14.9% are unique genes. Presence/absence variation genes were found to be highly associated with stress resistance in eggplants. Genome-wide association studies identified 946 SNPs and 9605 genes significantly associated with 10 important traits. Notably, genes involved in zeatin biosynthesis closely linked to plant auxins significantly impact fruit size and shape attributes, playing a crucial role in eggplant yield. This high-quality reference genome alongside the pan-genome will provide valuable insights into eggplant breeding advancement.

Cite this article

Download citation ▾
Chuying Yu, Weiliu Li, Yaqin Jiang, Qihong Yang, Guiyun Gan, Liangyu Cai, Wenjia Li, Yikui Wang. Graph pan-genome advances genetic discoveries and the improvement of eggplant. Horticulture Research, 2026, 13(1): 248 DOI:10.1093/hr/uhaf248

登录浏览全文

4963

注册一个新账户 忘记密码

Acknowledgments

This work was supported by the Guangxi Key Research and Development Program Project (GuikeAB 25069493), the National Natural Science Foundation of China (32360764), Guangxi Innovation Team of National Modern Agricultural Technology System (nycytxgxcxtd-2023-10-1), and Nanning Science Research and Technology Development Program Project (NNKJ202402). Thanks to Shaofang He and Xinxin Yi for guiding the data analysis of this article.

Author Contributions

C.Y.: Methodology, Validation, Writing—original draft, Writing—review & editing. W.L.: Investigation, Resources, Data curation, Formal Analysis. Q.Y.: Software, Validation, Investigation, Visualization. Y.J.: Data curation. G.G.: Resources. L.C.: Investigation. W.L.: Supervision, Project administration. Y.W.: Conceptualization, Methodology, Validation, Supervision, Project administration, Funding acquisition.

Data availability statement

The sequencing data (PacBio Whole Genome Sequencing data forassembly, resequencing data, Hi-C data, and Mixed-Sample RNAseq data for Annotation, the T2T genome assembly data, the whole-genome sequencing data) presented in this present study have been deposited in the NCBI repository under accession number PRJNA1173246, PRJNA1274827, PRJNA1275635, PRJNA1155721.

Conflicts of interest statement

No competing interest is declared.

Supplementary material

Supplementary material is available at Horticulture Research online.

References

[1]

Gaccione L, Martina M, Barchi L. et al. Compendium for novel marker-based breeding strategies in eggplant. Plants (Basel). 2023; 12:1016

[2]

Bohs L, Weese TL. Eggplant origins: out of Africa, into the orient. Taxon. 2010; 59:49-56

[3]

Kashyap A, Garg P, Tanwar K. et al. Strategies for utilization of crop wild relatives in plant breeding programs. Theor Appl Genet. 2022; 135:4151-67

[4]

Huang X, Huang S, Han B. et al. The integrated genomics of crop domestication and breeding. Cell. 2022; 185:2828-39

[5]

Ano G, Hebert Y, Prior P. et al. A new source of resis-tance to bacterial wilt of eggplants obtained from a cross: Solanum aethiopicum L × Solanum melongena L. Agronomie. 1991; 11:555-60

[6]

Walshaw S. Plant resources of tropical Africa 2. Veg Econ Bot. 2005; 59:401-2

[7]

Baloch FS, Altaf MT, Liaqat W. et al. Recent advance-ments in the breeding of sorghum crop: current status and future strategies for marker-assisted breeding. Front Genet. 2023; 14:1150616

[8]

Chen J, Li S, Zhou L. et al. Rapid breeding of an early maturing, high-quality, and high-y.ielding rice cultivar using marker-assisted selection coupled with optimized anther culture. Mol Breed. 2024; 44:58

[9]

Huang J, Zhang Y, Li Y. et al. Haplotype-resolved gap-less genome and chromosome segment substitution lines facilitate gene identification in wild rice. Nat Commun. 2024; 15:4573

[10]

Zhang R, Zhang C, Yu C. et al. Integration of multi-omics tech-nologies for crop improvement: status and prospects. Front Bioinform. 2022; 2:1027457

[11]

Hong K, Radian Y, Manda T. et al. The development of plant genome sequencing technology and its conservation and application in endangered gymnosperms. Plants (Basel). 2023; 12:4006

[12]

Hirakawa H, Shirasawa K, Miyatake K. et al. Draft genome sequence of eggplant ( the represen-tative solanum species indigenous to the old world. DNA Res. 2014; 21:649-60

[13]

Song B, Song Y, Fu Y. et al. Draft genome sequence of Solanum aethiopicum provides insights into disease resis-tance, drought tolerance, and the evolution of the genome. Gigascience. 2019;8:giz115

[14]

Barchi L, Pietrella M, Venturini L. et al. A chromosome-anchored eggplant genome sequence reveals key events in Solanaceae evolution. Sci Rep. 2019; 9:11769

[15]

Barchi L, Rabanus-Wallace MT, Prohens J. et al. Improved genome assembly and pan-genome provide key insights into eggplant domestication and breeding. Plant J. 2021; 107: 579-96

[16]

Wei Q, Wang J, Wang W. et al. A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant. Hortic Res. 2020; 7:153

[17]

Li D, Qian J, Li W. et al. A high-quality genome assembly of the eggplant provides insights into the molecular basis of disease resistance and chlorogenic acid synthesis. Mol Ecol Resour. 2021; 21:1274-86

[18]

Garg V, Bohra A, Mascher M. et al. Unlocking plant genetics with telomere-to-telomere genome assemblies. Nat Genet. 2024; 56:1788-99

[19]

Hou X, Wang D, Cheng Z. et al. A near-complete assembly of an Arabidopsis thaliana genome. Mol Plant. 2022; 15:1247-50

[20]

Li K, Jiang W, Hui Y. et al. Gapless indica rice genome reveals synergistic contributions of active transposable elements and segmental duplications to rice genome evolution. Mol Plant. 2021; 14:1745-56

[21]

Chen J, Wang Z, Tan K. et al. A complete telomere-to-telomere assembly of the maize genome. Nat Genet. 2023; 55:1221-31

[22]

Li G, Tang L, He Y. et al. The haplotype-resolved T2T reference genome highlights structural variation underlying agronomic traits of melon. Hortic Res. 2023;10:uhad182

[23]

Feng Y, Zhou J, Li D. et al. The haplotype-resolved T2T genome assembly of the wild potato species Solanum commersonii provides molecular insights into its freezing tolerance. Plant Commun. 2024; 5:100980

[24]

Li Q, Qiao X, Li L. et al. Haplotype-resolved T2T genome assemblies and pangenome graph of pear reveal diverse pat-terns of allele-specific expression and the genomic basis of fruit quality traits. Plant Commun. 2024; 5:101000

[25]

Lin X, Xu Y, Wang D. et al. Systematic identification of wheat spike developmental regulators by integrated multi-omics, transcriptional network, GWAS, and genetic analyses. Mol Plant. 2024; 17:438-59

[26]

Sahito JH, Zhang H, Gishkori ZGN. et al. Advancements and prospects of genome-wide association studies (GWAS) in maize. Int J Mol Sci. 2024; 25:1918

[27]

Xie X, Zhang Q, Liu YG. Rice GWAS-to-gene uncovers the polygenic basis of traits. Sci China Life Sci. 2024; 67: 2783-5

[28]

Hakla HR, Sharma S, Urfan M. et al. Genome-wide associ-ation study (GWAS) for identifying SNPs and genes related to phosphate-induced phenotypic traits in tomato (Solanum lycopersicum L.). Plants (Basel). 2024; 13:457

[29]

Gowda SA, Fang H, Tyagi P. et al. Genome-wide association study of fiber quality traits in US upland cotton (Gossypium hirsutum L.). Theor Appl Genet. 2024; 137:214

[30]

Zhang Y, Zhao M, Tan J. et al. Telomere-to-telomere Citrullus super-pangenome provides direction for watermelon breed-ing. Nat Genet. 2024; 56:1750-61

[31]

Li N, He Q, Wang J. et al. Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species. Nat Genet. 2023; 55:852-60

[32]

Niu Y, Liu Q, He Z. et al. A Brassica carinata pan-genome platform for brassica crop improvement. Plant Commun. 2024; 5:100725

[33]

Li YH, Zhou G, Ma J. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat Biotechnol. 2014; 32:1045-52

[34]

Qin P, Lu H, Du H. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell. 2021; 184:3542-3558.e16

[35]

Gui S, Wei W, Jiang C. et al. A pan-Zea genome map for enhanc-ing maize improvement. Genome Biol. 2022; 23:178

[36]

Tiwari VK, Saripalli G, Sharma PK. et al. Wheat genomics: genomes, pangenomes, and beyond. Trends Genet. 2024; 40: 982-92

[37]

Li J, Yuan D, Wang P. et al. Cotton pan-genome retrieves the lost sequences and genes during domestication and selec-tion. Genome Biol. 2021; 22:119

[38]

Mohanty S, Mishra BK, Dasgupta M. et al. Deciphering pheno-typing, DNA barcoding, and RNA secondary structure predic-tions in eggplant wild relatives provide insights for their future breeding strategies. Sci Rep. 2023; 13:13829

[39]

Pathirana R, Carimi F. Management and utilization of plant genetic resources for a sustainable agriculture. Plants (Basel). 2022; 11:2038

[40]

Taher D, Solberg S, Prohens J. et al. World vegetable center eggplant collection: origin, composition, seed dissemination and utilization in breeding. Front Plant Sci. 2017; 8:1484

[41]

Barchi L, Aprea G, Rabanus-Wallace MT. et al. Analysis of >3400 worldwide eggplant accessions reveals two independent domestication events and multiple migration-diversification routes. Plant J. 2023; 116:1667-80

[42]

Porebski S, Bailey LG, Baum BR. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol Biol Report. 1997; 15: 8-15

[43]

Belton JM, McCord RP, Gibcus JH. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012; 58:268-76

[44]

Cheng H, Concepcion GT, Feng X. et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021; 18:170-5

[45]

Dudchenko O, Batra SS, Omer AD. et al. De novo assembly of the Aedes aegypti genome using hi-C yields chromosome-length scaffolds. Science. 2017; 356:92-5

[46]

Lin Y, Ye C, Li X. et al. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat iden-tification. Hortic Res. 2023;10:uhad127

[47]

Simão FA, Waterhouse RM, Ioannidis P. et al. BUSCO: assess-ing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015; 31:3210-2

[48]

Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018; 34:3094-100

[49]

Ou S, Chen J, Jiang N. Assessing genome assembly qual-ity using the LTR assembly index (LAI). Nucleic Acids Res. 2018; 46:e126

[50]

Mo C, Wu Z, Shang X. et al. Chromosome-level and graphic genomes provide insights into metabolism of bioactive metabolites and cold-adaption of Pueraria lobata var. mon-tana. DNA Res. 2022;29:dsac030

[51]

Flynn JM, Hubley R, Goubert C. et al. RepeatModeler2 for auto-mated genomic discovery of transposable element families. Proc Natl Acad Sci USA. 2020; 117:9451-7

[52]

Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999; 27:573-80

[53]

Bao Z, Eddy SR. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002; 12:1269-76

[54]

Beier S, Thiel T, Münch T. et al. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017; 33:2583-5

[55]

Xu Z, Wang H. LTR_FINDER: an efficient tool for the predic-tion of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265-8

[56]

Ou S, Jiang N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotrans-posons. Plant Physiol. 2018; 176:1410-22

[57]

Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2004 Chapter 4: Unit 4.10;5:

[58]

Huang XZ, Gong SD, Shang XH. et al. High-integrity Puer-aria montana var. lobata genome and population analysis revealed the genetic diversity of Pueraria genus. DNA Res. 2024;31:dsae017

[59]

Hoff KJ, Stanke M. WebAUGUSTUS-a web service for training AUGUSTUS and predicting genes in eukaryotes. Nucleic Acids Res. 2013;41:W123-8

[60]

Holt C, Yandell M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics. 2011; 12:491

[61]

Kim D, Paggi JM, Park C. et al. Graph-based genome align-ment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019; 37:907-15

[62]

Grabherr MG, Haas BJ, Yassour M. et al. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat Biotechnol. 2011; 29:644

[63]

Haas BJ, Papanicolaou A, Yassour M. et al. De novo tran-script sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat Protoc. 2013; 8:1494-512

[64]

Haas BJ, Salzberg SL, Zhu W. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the pro-gram to assemble spliced alignments. Genome Biol. 2008;9:R7

[65]

Boeckmann B, Bairoch A, Apweiler R. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003; 31:365-70

[66]

Kanehisa M. The KEGG database. Novartis Found Symp. 2002; 247:91-101

[67]

Camacho C, Coulouris G, Avagyan V. et al. BLAST+: architec-ture and applications. BMC Bioinformatics. 2009; 10:421

[68]

Madeira F, Madhusoodanan N, Lee J. et al. The EMBL-EBI job dispatcher sequence analysis tools framework in 2024. Nucleic Acids Res. 2024;52:W521-5

[69]

Conesa A, Götz S, García-Gómez JM. et al. Blast2GO: a univer-sal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005; 21:3674-6

[70]

Emms DM, Kelly S. OrthoFinder: phylogenetic orthology infer-ence for comparative genomics. Genome Biol. 2019; 20:238

[71]

Stamatakis A. RAxML version 8: a tool for phylogenetic anal-ysis and post-analysis of large phylogenies. Bioinformatics. 2014; 30:1312-3

[72]

Kumar S, Stecher G, Suleski M. et al. TimeTree: a resource for timelines, Timetrees, and divergence times. Mol Biol Evol. 2017; 34:1812-9

[73]

Yang Z. PAML 4: phylogenetic analysis by maximum likeli-hood. Mol Biol Evol. 2007; 24:1586-91

[74]

Han MV, Thomas GW, Lugo-Martinez J. et al. Estimat-ing gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE3. Mol Biol Evol. 2013; 30:1987-97

[75]

Shang X, Yi X, Xiao L. et al. Chromosomal-level genome and multi-omics dataset of Pueraria lobata var.thomsonii pro-vide new insights into legume family and the isoflavone and puerarin biosynthesis pathways. Hortic Res. 2022;9: uhab035

[76]

Wang Y, Tang H, Debarry JD. et al. MCScanX: a toolkit for detec-tion and evolutionary analysis of gene synteny and collinear-ity. Nucleic Acids Res. 2012; 40:e49

[77]

Zhang Z, Xiao J, Wu J. et al. ParaAT: a parallel tool for con-structing multiple protein-coding DNA alignments. Biochem Biophys Res Commun. 2012; 419:779-81

[78]

Zhang Z. KaKs_Calculator 3.0: calculating selective pressure on coding and non-coding sequences. Genomics Proteomics Bioinformatics. 2022; 20:536-40

[79]

Tang H, Krishnakumar V, Zeng X. et al. JCVI: a versatile toolkit for comparative genomics analysis. Imeta. 2024; 3:e211

[80]

Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint. 2013; arXiv:1303.3997

[81]

Cortinovis G, Vincenzi L, Anderson R. et al. Adaptive gene loss in the common bean pan-genome during range expansion and domestication. Nat Commun. 2024; 15:6698

[82]

Langmead B, Wilks C, Antonescu V. et al. Scaling read aligners to hundreds of threads on general-purpose processors. Bioin-formatics. 2019; 35:421-32

[83]

Li H, Handsaker B, Wysoker A. et al. The sequence align-ment/map format and SAMtools. Bioinformatics. 2009; 25: 2078-9

[84]

McKenna A, Hanna M, Banks E. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20:1297-303

[85]

Danecek P, Auton A, Abecasis G. et al. The variant call format and VCFtools. Bioinformatics. 2011; 27:2156-8

[86]

Cingolani P, Platts A, Wang le L. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012; 6: 80-92

[87]

Liu C, Wang Y, Peng J. et al. High-quality genome assem-bly and pan-genome studies facilitate genetic discovery in mung bean and its improvement. Plant Commun. 2022; 3: 100352

[88]

Alexander DH, Novembre J, Lange K. Fast model-based esti-mation of ancestry in unrelated individuals. Genome Res. 2009; 19:1655-64

[89]

Yang J, Lee SH, Goddard ME. et al. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011; 88:76-82

[90]

Letunic I, Bork P. Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293-6

[91]

Hardy OJ, Vekemans X. SPAGeDi: a versatile computer pro- gram to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes. 2002; 2:618-20

[92]

Kang HM, Sul JH., Service SK et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010; 42:348-54

PDF (5874KB)

228

Accesses

0

Citation

Detail

Sections
Recommended

/