A centromere map based on super pan-genome highlights the structure and function of rice centromeres
Yang Lv, Congcong Liu, Xiaoxia Li, Yueying Wang, Huiying He, Wenchuang He, Wu Chen, Longbo Yang, Xiaofan Dai, Xinglan Cao, Xiaoman Yu, Jiajia Liu, Bin Zhang, Hua Wei, Hong Zhang, Hongge Qian, Chuanlin Shi, Yue Leng, Xiangpei Liu, Mingliang Guo, Xianmeng Wang, Zhipeng Zhang, Tianyi Wang, Bintao Zhang, Qiang Xu, Yan Cui, Qianqian Zhang, Qiaoling Yuan, Noushin Jahan, Jie Ma, Xiaoming Zheng, Yongfeng Zhou, Qian Qian, Longbiao Guo, Lianguang Shang
A centromere map based on super pan-genome highlights the structure and function of rice centromeres
Rice (Oryza sativa) is a significant crop worldwide with a genome shaped by various evolutionary factors. Rice centromeres are crucial for chromosome segregation, and contain some unreported genes. Due to the diverse and complex centromere region, a comprehensive understanding of rice centromere structure and function at the population level is needed. We constructed a high-quality centromere map based on the rice super pan-genome consisting of a 251-accession panel comprising both cultivated and wild species of Asian and African rice. We showed that rice centromeres have diverse satellite repeat CentO, which vary across chromosomes and subpopulations, reflecting their distinct evolutionary patterns. We also revealed that long terminal repeats (LTRs), especially young Gypsy-type LTRs, are abundant in the peripheral CentO-enriched regions and drive rice centromere expansion and evolution. Furthermore, high-quality genome assembly and complete telomere-to-telomere (T2T) reference genome enable us to obtain more centromeric genome information despite mapping and cloning of centromere genes being challenging. We investigated the association between structural variations and gene expression in the rice centromere. A centromere gene, OsMAB, which positively regulates rice tiller number, was further confirmed by expression quantitative trait loci, haplotype analysis and clustered regularly interspaced palindromic repeats (CRISPR)/CRISPR-associated protein 9 methods. By revealing the new insights into the evolutionary patterns and biological roles of rice centromeres, our finding will facilitate future research on centromere biology and crop improvement.
centromere / super pan-genome / CentO satellite repeat / rice
[1] |
Alonge, M., Lebeigle, L., Kirsche, M., Jenike, K., Ou, S., Aganezov, S., Wang, X., Lippman, Z.B., Schatz, M.C., and Soyk, S. (2022). Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol. 23: 258.
|
[2] |
Altemose, N., Logsdon, G.A., Bzikadze, A.V., Sidhwani, P., Langley, S.A., Caldas, G.V., Hoyt, S.J., Uralsky, L., Ryabov, F.D., Shew, C.J., et al. (2022). Complete genomic and epigenetic maps of human centromeres. Science 376: eabl4178.
|
[3] |
Benson, G. (1999). Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res. 27: 573-580.
|
[4] |
Bodenhofer, U., Bonatesta, E., Horejš-Kainrath, C., and Hochreiter, S. (2015). msa: An R package for multiple sequence alignment. Bioinformatics 31: 3997-3999.
|
[5] |
Buchfink, B., Xie, C., and Huson, D.H. (2015). Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12: 59-60.
|
[6] |
Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. (2009). BLAST+: Architecture and applications. BMC Bioinformatics 10: 421.
|
[7] |
Chang, X., He, X., Li, J., Liu, Z., Pi, R., Luo, X., Wang, R., Hu, X., Lu, S., Zhang, X., et al. (2023). High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the landscape and evolution of centromeres. Plant Commun. 5: 100722.
|
[8] |
Cheng, Z., Dong, F., Langdon, T., Ouyang, S., Buell, C.R., Gu, M., Blattner, F.R., and Jiang, J. (2002). Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon. Plant Cell 14: 1691-1704.
|
[9] |
Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T., et al. (2011). The variant call format and VCFtools. Bioinformatics 27: 2156-2158.
|
[10] |
Danecek, P., Bonfield, J.K., Liddle, J., Marshall, J., Ohan, V., Pollard, M.O., Whitwham, A., Keane, T., McCarthy, S.A., Davies, R.M., et al. (2021). Twelve years of SAMtools and BCFtools. Gigascience 10: giab008.
|
[11] |
Dong, F., Miller, J.T., Jackson, S.A., Wang, G.L., Ronald, P.C., and Jiang, J. (1998). Rice (Oryza sativa) centromeric regions consist of complex DNA. Proc. Natl. Acad. Sci. U.S.A. 95: 8135-8140.
|
[12] |
Fu, A., Zheng, Y., Guo, J., Grierson, D., Zhao, X., Wen, C., Liu, Y., Li, J., Zhang, X., Yu, Y., et al. (2023). Telomere-to-telomere genome assembly of bitter melon (Momordica charantia L. var. abbreviata Ser.) reveals fruit development, composition and ripening genetic characteristics. Hortic. Res. 10: uhac228.
|
[13] |
Furuyama, S., and Biggins, S. (2007). Centromere identity is specified by a single centromeric nucleosome in budding yeast. Proc. Natl. Acad. Sci. U.S.A. 104: 14706-14711.
|
[14] |
Garrison, E.P., and Marth, G.T. (2012). Haplotype-based variant detection from short-read sequencing. arXiv.
CrossRef
Google scholar
|
[15] |
Hu, J., Fan, J., Sun, Z., and Liu, S. (2020). NextPolish: A fast and efficient genome polishing tool for long-read assembly. Bioinformatics 36: 2253-2255.
|
[16] |
Hu, J., Wang, Y., Fang, Y., Zeng, L., Xu, J., Yu, H., Shi, Z., Pan, J., Zhang, D., Kang, S., et al. (2015). A rare allele of GS2 enhances grain size and grain yield in rice. Mol. Plant 8: 1455-1465.
|
[17] |
Hu, J., Wang, Z., Sun, Z., Hu, B., Ayoola, A.O., Liang, F., Li, J., Sandoval, J.R., Cooper, D.N., Ye, K., et al. (2023). An efficient error correction and accurate assembly tool for noisy long reads. BioRxiv.
CrossRef
Google scholar
|
[18] |
Jiang, J. (2013). Rice centromeres. In Plant Centromere Biology, J. Jiang, and J.A. Birchler, eds, (Iowa: John Wiley & Sons, Inc.), pp. 15-24.
|
[19] |
Jiang, J., Birchler, J.A., Parrott, W.A., Dawe, R.K. (2003). A molecular view of plant centromeres. Trends Plant Sci. 8: 570-575.
|
[20] |
Jiang, T., Liu, Y., Jiang, Y., Li, J., Gao, Y., Cui, Z., Liu, Y., Liu, B., and Wang, Y. (2020). Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21: 189.
|
[21] |
Jombart, T., and Ahmed, I. (2011). adegenet 1.3-1: New tools for the analysis of genome-wide SNP data. Bioinformatics 27: 3070-3071.
|
[22] |
Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002). MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30: 3059-3066.
|
[23] |
Kim, D., Paggi, J.M., Park, C., Bennett, C., and Salzberg, S.L. (2019). Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37: 907-915.
|
[24] |
Lamb, J., Yu, W., Han, F., and Birchler, J. (2007). Plant chromosomes from end to end: Telomeres, heterochromatin and centromeres. Curr. Opin. Plant Biol. 10: 116-122.
|
[25] |
Li, H. (2018). Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics 34: 3094-3100.
|
[26] |
Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754-1760.
|
[27] |
Liao, Y., Zhang, X., Li, B., Liu, T., Chen, J., Bai, Z., Wang, M., Shi, J., Walling, J.G., Wing, R.A., et al. (2018). Comparison of Oryza sativa and Oryza brachyantha genomes reveals selection-driven gene escape from the centromeric regions. Plant Cell 30: 1729-1744.
|
[28] |
Liu, Y., Yi, C., Fan, C., Liu, Q., Liu, S., Shen, L., Zhang, K., Huang, Y., Liu, C., Wang, Y., et al. (2023). Pan-centromere reveals widespread centromere repositioning of soybean genomes. Proc. Natl. Acad. Sci. U.S.A. 120: e2310177120.
|
[29] |
Livak, K.J., and Schmittgen, T.D. (2001). Analysis of relative gene expression data using real-time quantitative PCR and the 2-ΔΔCT Method. Methods 25: 402-408.
|
[30] |
Ma, H., Ding, W., Chen, Y., Zhou, J., Chen, W., Lan, C., Mao, H., Li, Q., Yan, W., and Su, H. (2023). Centromere plasticity with evolutionary conservation and divergence uncovered by wheat 10+ genomes. Mol. Biol. Evol. 40: msad176.
|
[31] |
Ma, J., and Jackson, S.A. (2006). Retrotransposon accumulation and satellite amplification mediated by segmental duplication facilitate centromere expansion in rice. Genome Res. 16: 251-259.
|
[32] |
Ma, X., Zhang, Q., Zhu, Q., Liu, W., Chen, Y., Qiu, R., Wang, B., Yang, Z., Li, H., Lin, Y., et al. (2015). A robust CRISPR/Cas9 system for convenient, high-efficiency multiplex genome editing in monocot and dicot plants. Mol. Plant 8: 1274-1284.
|
[33] |
Marçais, G., Delcher, A.L., Phillippy, A.M., Coston, R., Salzberg, S.L., and Zimin, A. (2018). MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 14: e1005944.
|
[34] |
Nagaki, K., Cheng, Z., Ouyang, S., Talbert, P.B., Kim, M., Jones, K.M., Henikoff, S., Buell, C.R., and Jiang, J. (2004). Sequencing of a rice centromere uncovers active genes. Nat. Genet. 36: 138-145.
|
[35] |
Naish, M., Alonge, M., Wlodzimierz, P., Tock, A.J., Abramson, B.W., Lambing, C., Kuo, P., Yelina, N., Hartwick, N., Colt, K., et al. (2021). The genetic and epigentic landscape of the Arabidopsis centromeres. Science 374: eabi7489.
|
[36] |
Nguyen, L.T., Schmidt, H.A., von Haeseler, A., and Minh, B.Q. (2015). IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32: 268-274.
|
[37] |
Ou, S., and Jiang, N. (2018). LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176: 1410-1422.
|
[38] |
Ou, S., Su, W., Liao, Y., Chougule, K., Agda, J.R.A., Hellinga, A.J., Lugo, C.S.B., Elliott, T.A., Ware, D., Peterson, T., et al. (2019). Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20: 275.
|
[39] |
Pertea, M., Kim, D., Pertea, G.M., Leek, J.T., and Salzberg, S.L. (2016). Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11: 1650-1667.
|
[40] |
Quinlan, A.R., and Hall, I.M. (2010). BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841-842.
|
[41] |
Roach, M.J., Schmidt, S.A., and Borneman, A.R. (2018). Purge Haplotigs: Allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19: 460.
|
[42] |
Shabalin, A.A. (2012). Matrix eQTL: Ultra fast eQTL analysis via large matrix operations. Bioinformatics 28: 1353-1358.
|
[43] |
Shang, L., He, W., Wang, T., Yang, Y., Xu, Q., Zhao, X., Yang, L., Zhang, H., Li, X., Lv, Y., et al. (2023). A complete assembly of the rice Nipponbare reference genome. Mol. Plant 16: 1232-1236.
|
[44] |
Shang, L., Li, X., He, H., Yuan, Q., Song, Y., Wei, Z., Lin, H., Hu, M., Zhao, F., Zhang, C., et al. (2022). A super pan-genomic landscape of rice. Cell Res. 32: 878-896.
|
[45] |
Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. (2015). BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31: 3210-3212.
|
[46] |
Song, J.-M., Xie, W.-Z., Wang, S., Guo, Y.-X., Koo, D.-H., Kudrna, D., Gong, C., Huang, Y., Feng, J.-W., Zhang, W., et al. (2021). Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14: 1757-1767.
|
[47] |
Wlodzimierz, P., Rabanal, F.A., Burns, R., Naish, M., Primetis, E., Scott, A., Mandáková, T., Gorringe, N., Tock, A.J., Holland, D., et al. (2023). Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618: 557-565.
|
[48] |
Wu, J., Yamagata, H., Hayashi-Tsugane, M., Hijishita, S., Fujisawa, M., Shibata, M., Ito, Y., Nakamura, M., Sakaguchi, M., Yoshihara, R., et al. (2004). Composition and structure of the centromeric region of rice chromosome 8. Plant Cell 16: 967-976.
|
[49] |
Xie, E., Chen, J., Wang, B., Shen, Y., Tang, D., Du, G., Li, Y., and Cheng, Z. (2023). The transcribed centromeric gene OsMRPL15 is essential for pollen development in rice. Plant Physiol. 192: 1063-1079.
|
[50] |
Yan, H., Ito, H., Nobuta, K., Ouyang, S., Jin, W., Tian, S., Lu, C., Venu, R.C., Wang, G.L., Green, P.J., et al. (2006). Genomic and genetic characterization of rice Cen3 reveals extensive transcription and evolutionary implications of a complex centromere. Plant Cell 18: 2123-2133.
|
[51] |
Zhou, J., Liu, Y., Guo, X., Birchler, J.A., Han, F., and Su, H. (2022). Centromeres: From chromosome biology to biotechnology applications and synthetic genomes in plants. Plant Biotechnol. J. 20: 2051-2063.
|
/
〈 | 〉 |