The pangenome enhances the understanding of the genetic diversity of papaya

Min Yang , Chenping Zhou , Xiangdong Kong , Ruibin Kuang , Chuanhe Liu , Xiaming Wu , Ze Xu , Han He , Yuerong Wei

Horticulture Research ›› 2026, Vol. 13 ›› Issue (2) : 282

PDF (2416KB)
Horticulture Research ›› 2026, Vol. 13 ›› Issue (2) :282 DOI: 10.1093/hr/uhaf282
Articles
research-article
The pangenome enhances the understanding of the genetic diversity of papaya
Author information +
History +
PDF (2416KB)

Abstract

Papaya (Carica papaya L.) is a nutritionally and medicinally important tropical fruit crop, yet its genetic improvement has been limited by insufficient genomic resources. In this study, we constructed chromosome-level genomes for three key varieties (Zhufeng, T3, and T5) and integrated them with three existing assemblies to build a comprehensive pangenome, including graph-based, linear, and syntelog-based representations. The syntelog-based pangenome revealed 24 453 syntelog groups (SGs). Leveraging resequencing data from 222 accessions aligned to the graph-based pangenome, we identified 26 173 structural variations (SVs), including a functionally relevant 94-bp deletion in the RETARDED ROOT GROWTH (RRG) gene in the T3 genome. This deletion affects the expression of the RRG, resulting in a reduction in its expression level in T3. Further phenotypic analysis showed that RRG can influence papaya root length by promoting the proliferation of root meristem cells and inhibiting cell elongation. Additionally, the linear pangenome uncovered 5273 translocations and 1440 inversions, significantly expanding the known SV repertoire in papaya. This study provides a critical genomic resource for deciphering domestication-related traits and accelerating marker-assisted breeding, ultimately advancing the genetic improvement of papaya.

Cite this article

Download citation ▾
Min Yang, Chenping Zhou, Xiangdong Kong, Ruibin Kuang, Chuanhe Liu, Xiaming Wu, Ze Xu, Han He, Yuerong Wei. The pangenome enhances the understanding of the genetic diversity of papaya. Horticulture Research, 2026, 13(2): 282 DOI:10.1093/hr/uhaf282

登录浏览全文

4963

注册一个新账户 忘记密码

Acknowledgements

This work was supported by the General Program of the National Natural Science Foundation of China (grant 32572974), the ‘Young and Middle-aged Academic Leaders’ training fund project of Guangdong Academy of Agricultural Sciences (grant R2023PY-JX005), the Cultivation Project of Fruit Tree Research Institute, Guangdong Academy of Agricultural Sciences (grant 23107), and the Guangdong Province Rural Revitalization Strategy Special Fund-Seed Industry Revitalization Action Project (grant 2024-NPY-00-028), and the Guangzhou Municipal Science and Technology Project (grant 2024A04J6951).

Authors contributions

M.Y. and Y.W. planned and designed the research. M.Y. and C.Z. performed most of the experiments and bioinformatics analyses. X.K., R.K., X.W., C.L., Z.X., and H.H. assisted with data analysis, sample collection, and extraction of total RNA or DNA. M.Y. and Y.W. wrote the manuscript. All authors contributed to the article and approved the submitted version.

Data availability

All sequencing data generated in this study were deposited at the National Center for Biotechnology Information (NCBl) under BioProject ID PRJNA1154410. The sequencing data of Zihui were downloaded from the NCBI (BioProject: PRJNA968045). The whole-genome resequencing was downloaded from the National Centre for Biotechnology Information (NCBI) under BioProject ID PRJNA970517.

Conflicts of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

Supplementary material is available at Horticulture Research online.

References

[1]

Zhou Z, Ford R, Bar I. et al. Papaya (Carica papaya L.) flavour profiling. Genes (Basel). 2021; 12:1416

[2]

Aravind G, Bhowmik D, Duraivel S. et al. Traditional and medicinal uses of Carica papaya. J Med Plants Stud. 2013; 1:7-15

[3]

Yang M. et al. Comparative transcriptomics and genomic analyses reveal differential gene expression related to Col-letotrichum brevisporum resistance in papaya (Carica papaya L.). Front Plant Sci. 2022; 13:1038598

[4]

Ming R, Hou S, Feng Y. et al. The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature. 2008; 452:991-6

[5]

Yue J, VanBuren R, Liu J. et al. SunUp and Sunset genomes revealed impact of particle bombardment mediated trans-formation and domestication history in papaya. Nat Genet. 2022; 54:715-24

[6]

Liao Z, Zhang X, Zhang S. et al. Structural variations in papaya genomes. BMC Genomics. 2021; 22:335

[7]

Yang M, Kong X, Zhou C. et al. Genomic insights into the domestication and genetic basis of yield in papaya. Hortic Res. 2025;12:uhaf045

[8]

Huang L, Tao S, Pan Y. et al. Molecular mechanisms of low temperature-induced aberrant chilling injury in papaya fruit: physiological and transcriptomic analysis on cell wall metabolism. Sci Hortic (Amsterdam). 2025; 344:114107

[9]

Braga CS, Ramos HCC, Santos JS. et al. Effect of papaya ringspot virus infection in Brazilian Carica papaya accessions under controlled conditions. Genet Resour Crop Evol. 2025; 72: 7223-33

[10]

Liu Y, du H, Li P. et al. Pan-genome of wild and cultivated soybeans. Cell. 2020; 182:162-176.e13

[11]

Xu K, Xu X, Fukao T. et al. Sub1A is an ethylene-response-factor-like gene that confers submergence tolerance to rice. Nature. 2006; 442:705-8

[12]

Cook DE, Lee TG, Guo X. et al. Copy number variation of multi-ple genes at rhg1 mediates nematode resistance in soybean. Science. 2012; 338:1206-9

[13]

Hufford MB, Xu X, van Heerwaarden J. et al. Comparative pop-ulation genomics of maize domestication and improvement. Nat Genet. 2012; 44:808-11

[14]

Deng Y, Zhai K, Xie Z. et al. Epigenetic regulation of antagonis-tic receptors confers rice blast resistance with yield balance. Science. 2017; 355:962-5

[15]

Lye ZN, Purugganan MD. Copy number variation in domesti-cation. Trends Plant Sci. 2019; 24:352-65

[16]

Li D, Wang Y, Yuan T. et al. Pangenome and genome variation analyses of pigs unveil genomic facets for their adaptation and agronomic characteristics. iMeta. 2024; 3:e257

[17]

Zhang C, Shao Z, Kong Y. et al. High-quality genome of a mod-ern soybean cultivar and resequencing of 547 accessions pro-vide insights into the role of structural variation. Nat Genet. 2024; 56:2247-58

[18]

Yang L, He W, Zhu Y. et al. GWAS meta-analysis using a graph-based pan-genome enhanced gene mining efficiency for agro-nomic traits in rice. Nat Commun. 2025; 16:3171

[19]

Li P, Quan X, Jia G. et al. RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants. BMC Genomics. 2016; 17:852

[20]

Nasir A, Kim KM, Caetano-Anollés G. Global patterns of protein domain gain and loss in superkingdoms. PLoS Comput Biol. 2014; 10:e1003452

[21]

De La Peña R, Hodgson H, Liu JC-T. et al. Complex scaf-fold remodeling in plant triterpene biosynthesis. Science. 2023; 379:361-8

[22]

Huang, Jiang T, Liu Y-X. et al. A specialized metabolic network selectively modulates Arabidopsis root microbiota. Science. 2019;364:eaau6389

[23]

Si L, Meng K, Tian Z. et al. Triterpenoids manipulate a broad range of virus-host fusion via wrapping the HR2 domain prevalent in viral envelopes. Sci Adv. 2024;4:eaau8408

[24]

Srivastava G, Vyas P, Kumar A. et al. Unraveling the role of cytochrome P450 enzymes in oleanane triterpenoid biosyn-thesis in arjuna tree. Plant J. 2024; 119:2687-705

[25]

Ahn YO, Zheng M, Bevan DR. et al. Functional genomic analysis of Arabidopsis thaliana glycoside hydrolase family 35. Phyto-chemistry. 2007; 68:1510-20

[26]

Liang X, Duan Q, Li B. et al. Genomic structural varia-tion contributes to evolved changes in gene expression in high-altitude Tibetan sheep. Proc Natl Acad Sci. 2024; 121: e2322291121

[27]

Pagani F, Buratti E, Stuani C. et al. A new type of mutation causes a splicing defect in ATM. Nat Genet. 2002; 30:426-9

[28]

Zhou X, Li Q, Chen X. et al. The Arabidopsis retarded root growth gene encodes a mitochondria-localized protein that is required for cell division in the root meristem. Plant Physiol. 2011; 157:1793-804

[29]

Schuhmann H, Huesgen PF, Gietl C. et al. The DEG15 serine protease cleaves peroxisomal targeting signal 2-containing proteins in Arabidopsis. Plant Physiol. 2008; 148:1847-56

[30]

Li J, Duan Y, Han Z. et al. Genome-wide identification and expression analysis of the NRAMP family genes in tea plant (Camellia sinensis). Plants. 2021; 10:10

[31]

Boudart G, Jamet E, Rossignol M. et al. Cell wall proteins in apoplastic fluids of Arabidopsis thaliana rosettes: identifica-tion by mass spectrometry and bioinformatics. Proteomics. 2005; 5:212-21

[32]

Consortium EUC 3 AGS, Research TI for G, Institute KDNAR. Sequence and analysis of chromosome 3 of the plant Ara-bidopsis thaliana. Nature. 2000; 408:820-3

[33]

Wang J, Yang W, Zhang S. et al. A pangenome analysis pipeline provides insights into functional gene identification in rice. Genome Biol. 2023; 24:19

[34]

Hübner S. Are we there yet? Driving the road to evolutionary graph-pangenomics. Curr Opin Plant Biol. 2022; 66:102195

[35]

Yuan Z LuoD, LiG. et al. Characterization of the AE7 gene in Arabidopsis suggests that normal cell proliferation is essential for leaf polarity establishment. Plant J. 2010; 64: 331-42

[36]

Yuan L, Yang X, Ellis JL. et al. The Arabidopsis SYN3 cohesin protein is important for early meiotic events. Plant J. 2012; 71:147-60

[37]

Waheed A, Rehman S, Parveen B. et al. Assessment of genetic diversity and phylogenetic relationship among brinjal geno-types based on chloroplast rps 11 gene. Genet Resour Crop Evol. 2024; 71:385-95

[38]

Zhou Y, Zhang Z, Bao Z. et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature. 2022; 606:527-34

[39]

Zhang X, Chen Y, Wang L. et al. Pangenome of water cal-trop reveals structural variations and asymmetric subgenome divergence after allopolyploidization. Hortic Res. 2023;10: uhad203

[40]

Wang M, Li J, Qi Z. et al. Genomic innovation and regulatory rewiring during evolution of the cotton genus Gossypium. Nat Genet. 2022; 54:1959-71

[41]

Wang S, Qian Y-Q, Zhao R-P. et al. Graph-based pan-genomes: increased opportunities in plant genomics. J Exp Bot. 2023; 74:24-39

[42]

Qin P, Lu H, du H. et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell. 2021; 184:3542-3558.e16

[43]

Wu D, Xie L, Sun Y. et al. A syntelog-based pan-genome pro-vides insights into rice domestication and de-domestication. Genome Biol. 2023; 24:179

[44]

Yang M, Zhou C, Kuang R. et al. Transcription factor CpWRKY50 enhances anthracnose resistance by promoting jasmonic acid signaling in papaya. Plant Physiol. 2024; 196:2856-70

[45]

Belaghzal H, Dekker J, Gibcus JH. Hi-C 2.0: An optimized Hi-C procedure for high-resolution genome-wide mapping of chromosome conformation. Methods. 2017; 123:56-65

[46]

Cheng H, Concepcion GT, Feng X. et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021; 18:170-5

[47]

Altschul SF, Gish W, Miller W. et al. Basic local alignment search tool. J Mol Biol. 1990; 215:403-10

[48]

Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013;1-3

[49]

Zhou C, McCarthy SA, Durbin R. YaHS: yet another Hi-C scaf-folding tool. Bioinformatics. 2023;39:btac808

[50]

Durand NC, Robinson JT, Shamim MS. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited Zoom. Cell Syst. 2016; 3:99-101

[51]

Li H. Minimap and miniasm: fast mapping and de novo assem-bly for noisy long sequences. Bioinformatics. 2016; 32:2103-10

[52]

Haas BJ, Papanicolaou A, Yassour M. et al. De novo tran-script sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013; 8:1494-512

[53]

Xie Y, Wu G, Tang J. et al. SOAPdenovo-Trans: de novo tran-scriptome assembly with short RNA-Seq reads. Bioinformat-ics. 2014; 30:1660-6

[54]

Fu L, Niu B, Zhu Z. et al. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012; 28: 3150-2

[55]

Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioin-form. 2009; 25:4.10.1-4.10.14

[56]

Price AL, Jones NC, Pevzner PA. De novo identification of repeat families in large genomes. Bioinformatics. 2005;21: i351-8

[57]

Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999; 27:573-80

[58]

Holt C, Yandell M. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics. 2011; 12:1-14

[59]

Cantalapiedra CP, Hernández-Plaza A, Letunic I. et al. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol. 2021; 38:5825-9

[60]

Xie C, Mao X, Huang J. et al. KOBAS 2.0: a web server for anno-tation and identification of enriched pathways and diseases. Nucleic Acids Res. 2011;39:W316-22

[61]

Finn RD, Coggill P, Eberhardt RY. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44:D279-85

[62]

Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 2011;39:W29-37

[63]

Buchfink B, Reuter K, Drost H-G. Sensitive protein align-ments at tree-of-life scale using DIAMOND. Nat Methods. 2021; 18:366-8

[64]

Haas BJ, Delcher AL, Wortman JR. et al. DAGchainer: a tool for mining segmental genome duplications and synteny. Bioin-formatics. 2004; 20:3643-6

[65]

Yu G, Wang L-G, Han Y. et al. clusterProfiler: an R Package for comparing biological themes among gene clusters. OmiAJ Integr Biol. 2012; 16:284-7

[66]

Wang D, Zhang Y, Zhang Z. et al. KaKs_Calculator 2.0: a toolkit incorporating gamma-series methods and sliding window strategies. Genom Proteom Bioinform. 2010; 8:77-80

[67]

Li H, Feng X, Chu C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 2020; 21: 265

[68]

Zheng Z, Zhu M, Zhang J. et al. A sequence-aware merger of genomic structural variations at population scale. Nat Com-mun. 2024; 15:960

[69]

Yang J, Lee SH, Goddard ME. et al. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011; 88: 76-82

[70]

Tang H, Peng J, Wang P. et al. Estimation of individual admix-ture: analytical and study design considerations. Genet Epi-demiol. 2005; 28:289-301

[71]

Felsenstein J. PHYLIP (Phylogeny Inference Package) v. 3.6. Seattle, WA: Department of Genome Sciences, University of Washington; 2005:

[72]

Danecek P, Auton A, Abecasis G. et al. The variant call format and VCFtools. Bioinformatics. 2011; 27:2156-8

[73]

Marçais G, Delcher AL, Phillippy AM. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol. 2018; 14:e1005944

[74]

Nattestad M, Schatz MC. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics. 2016; 32:3021-3

[75]

Rausch T, Zichner T, Schlattl A. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333-9

PDF (2416KB)

432

Accesses

0

Citation

Detail

Sections
Recommended

/