The China National GeneBank Sequence Archive (CNSA) 2024 update

Weiwen Wang; Cong Tan; Ling Li; Xia Li; Lei Zhang; Xiaoqiang Li; Jieyu Wang; Ziyi He; Tao Yang; Kailong Ma; Qingjiang Hu; Wenzhen Yang; Zhiyong Li; Mingwen Zhang; Wensi Du; Fan Yang; Zhicheng Xu; Xizheng Ma; Jiawei Tong; Jia Cai; Cong Hua; Fengzhen Chen; Lijin You; Liang Li; Wenjun Zeng; Bo Wang; Xun Xu; Xiaofeng Wei

doi:10.1093/hr/uhaf036

Horticulture Research ›› 2025, Vol. 12 ›› Issue (5) :36 DOI: 10.1093/hr/uhaf036

Article

research-article

The China National GeneBank Sequence Archive (CNSA) 2024 update

Weiwen Wang ¹^,²^,³^,^‡
, Cong Tan ⁴^,⁵^,^‡
, Ling Li ¹^,²^,³^,^‡
, Xia Li ²^,³
, Lei Zhang ⁵
, Xiaoqiang Li ⁵
, Jieyu Wang ²^,³
, Ziyi He ⁵
, Tao Yang ²^,³
, Kailong Ma ²^,³
, Qingjiang Hu ²^,³
, Wenzhen Yang ²^,³
, Zhiyong Li ⁵
, Mingwen Zhang ⁵
, Wensi Du ²^,³
, Fan Yang ²^,³
, Zhicheng Xu ²^,³
, Xizheng Ma ²^,³
, Jiawei Tong ⁵
, Jia Cai ⁵
, Cong Hua ⁵
, Fengzhen Chen ³
, Lijin You ²^,³
, Liang Li ²^,³
, Wenjun Zeng ²^,³
, Bo Wang ²^,³^,^*
, Xun Xu ³^,^*
, Xiaofeng Wei ¹^,²^,³^,^*

Author information +

History +

PDF (1160KB)

Abstract

The China National GeneBank Sequence Archive (CNSA) is an open and freely accessible curated data repository built for archiving, sharing, and reutilizing of multiomics data. The remarkable advancement in sequencing technologies has triggered a paradigm shift in life science research. However, it also poses tremendous challenges for the research community in data management and reusability. With the dramatic advance of sequencing technologies like spatial transcriptome sequencing, it brings an unprecedented explosion in sequence data and new requirements for data archiving. CNSA was established in 2017 as one of the fundamental infrastructures to offer multiomics data archiving for the worldwide research community. Here, we present the state-of-the-art enhancements of CNSA encompassing the dramatical increase of varied types of data, the latest features and services implemented in CNSA as well as consistent efforts supporting global cooperation in biodiversity preservation and utilization. CNSA provides public archiving and open-sharing services for sequencing data and relevant metadata including genome, transcriptome, metabolism, and proteome from single-cell (also spatial resolved) level to individual and population level, as well as further analyzed results. As of 2024, CNSA has archived >16.3 petabytes of data and provided the data curation, preservation, and open-share service for >1581 publications from >560 institutions. It plays a pivotal role in supporting global scientific projects such as the 10 000 Plant Genomes Project. So far, CNSA has been recommended by various academic publishers such as Cell, Elsevier, and Oxford University Press. CNSA is accessible at https://db.cngb.org/cnsa/.

Cite this article

Download citation ▾

Weiwen Wang, Cong Tan, Ling Li, Xia Li, Lei Zhang, Xiaoqiang Li, Jieyu Wang, Ziyi He, Tao Yang, Kailong Ma, Qingjiang Hu, Wenzhen Yang, Zhiyong Li, Mingwen Zhang, Wensi Du, Fan Yang, Zhicheng Xu, Xizheng Ma, Jiawei Tong, Jia Cai, Cong Hua, Fengzhen Chen, Lijin You, Liang Li, Wenjun Zeng, Bo Wang, Xun Xu, Xiaofeng Wei. The China National GeneBank Sequence Archive (CNSA) 2024 update. Horticulture Research, 2025, 12(5): 36 DOI:10.1093/hr/uhaf036

登录浏览全文

4963

注册一个新账户忘记密码

Acknowledgements

This study was supported by the Guangdong Genomics Data Center (2021B1212100001), Shenzhen Science and Technology Program (KQTD20230301092839007), Biological Breeding-National Science and Technology Major Project (2023ZD04073), and the China National GeneBank.

Author contributions

L.Y., C.H., and F.C. conceptualized the data repository. W.W., L.L., X.L., L.Z., and X.L. curated the data. W.Y., Z.L., Q.H., W.D., F.Y., J.T., and J.C. constructed the data archive and sharing system. L.L. and W.Z. provided hardware support. X.M., Z.X., K.M., and J.W. processed the data. Z.Y., M.Z., and T.Y. operated the data repository. X.X., B.W., and X.W. supervised the project. X.X., X.W., B.W., W.Z., and C.T. acquired the funding. W.W. and C.T. drafted and finalized the manuscript. All authors reviewed and approved the final manuscript.

Data availability

All resources are available at https://db.cngb.org/cnsa/.

Conflict of interest statement

The authors declare that they have no conflict of interest.

Supplementary Data

Supplementary data is available at Horticulture Research online.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Lewin HA, Richards S, Lieberman Aiden E. et al. The Earth BioGenome Project 2020: starting the clock. Proc Natl Acad Sci USA. 2022;119:e2115635118

[2]	Lewin HA, Robinson GE, Kress WJ. et al. Earth BioGenome Project: sequencing life for the future of life. Proc Natl Acad Sci USA. 2018;115:4325-33

[3]	Genome 10K Community of Scientists. Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. J Hered. 2009;100:659-74

[4]	Sayers EW, Beck J, Bolton EE. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2023;52:D33-43

[5]	Yuan D, Ahamed A, Burgin J. et al. The European Nucleotide Archive in 2023. Nucleic Acids Res. 2023;52:D92-7

[6]	Ara T, Kodama Y, Tokimatsu T. et al. DDBJ update in 2023: the MetaboBank for metabolomics data and associated metadata. Nucleic Acids Res. 2023;52:D67-71

[7]	CNCB-NGDC Members and Partners. Database resources of the National Genomics Data Center, China National Center for Bioinformation in 2024. Nucleic Acids Res. 2023;52:D18-32

[8]	Guo X, Chen F, Gao F. et al.CNSA: a data repository for archiving omics data. Database (Oxford). 2020;2020:baaa055

[9]	Chen FZ, You LJ, Yang F. et al. CNGBdb: China National GeneBank DataBase. Yi Chuan. 2020;42:799-809

[10]	Fan G, Song Y, Yang L. et al. Initial data release and announce-ment of the 10,000 Fish Genomes Project (Fish10K). Gigascience. 2020;9:giaa080

[11]	Zhang G, Rahbek C, Graves GR. et al. Genomics: bird sequencing project takes off. Nature. 2015;522:34

[12]	Cheng S, Melkonian M, Smith SA. et al. 10KP: a phylodiverse genome sequencing plan. Gigascience. 2018;7:1-9

[13]	Xu Z, Wang W, Yang T. et al. STOmicsDB: a comprehensive database for spatial transcriptomics data sharing, analysis and visualization. Nucleic Acids Res. 2024;52:D1053-61

[14]	Sansone S-A, McQuilton P, Rocca-Serra P. et al. FAIRsharing as a community approach to standards, repositories and policies. Nat Biotechnol. 2019;37:358-67

[15]	Hubbard B. (2005).

[16]	Pampel H, Vierkant P, Scholze F. et al. Making research data repositories visible: the re3data. org registry. PLoS One. 2013;8:e78080

[17]	L’Hours H, Kleemola M, de Leeuw L. CoreTrustSeal: from aca-demic collaboration to sustainable services. IASSIST Q. 2019;43: 1-17

[18]	Yang FS, Liu M, Guo X. et al. Signatures of adaptation and puri-fying selection in highland populations of Dasiphora fruticosa. Mol Biol Evol. 2024;41:msae099

[19]	Chen H, Sahu SK, Wang S. et al. Chromosome-level Alstonia scholaris genome unveils evolutionary insights into biosyn-thesis of monoterpenoid indole alkaloids. iScience. 2024;27: 109599

[20]	Luo L, Fang D, Wang F. et al. The chromosome-level genomes of the herbal magnoliids Warburgia ugandensis and Saururus chinensis. Sci Data. 2024;11:554

[21]	Sahu SK, Liu M, Li R. et al. Chromosome-scale genome of Indian rosewood (Dalbergia sissoo). Front Plant Sci. 2023;14:1218515

[22]	Sahu SK, Liu M, Chen Y. et al. Chromosome-scale genomes of commercial timber trees (Ochroma pyramidale, Mesua ferrea, and Tectona grandis). Sci Data. 2023;10:512

[23]	Wang J, Xie J, Chen H. et al. A draft genome of the medicinal plant Cremastra appendiculata (D. Don) provides insights into the colchicine biosynthetic pathway. Commun Biol. 2022;5:1294

[24]	Guo X, Wang F, Fang D. et al. The genome of Acorus deciphers insights into early monocot evolution. Nat Commun. 2023;14: 3662

[25]	Cheng S, Xian W, Fu Y. et al. Genomes of subaerial Zygne-matophyceae provide insights into land plant evolution. Cell. 2019;179:1057-1067.e14

[26]	Chen H, Chiu TY, Sahu SK. et al. Transcriptomic analyses provide new insights into green and purple color pigmenta-tion in Rheum tanguticum medicinal plants. PeerJ. 2022;10: e14265

[27]	Yin R, Chen R, Xia K. et al. A single-cell transcriptome atlas reveals the trajectory of early cell fate transition during callus induction in Arabidopsis. Plant Commun. 2024;5:100941

[28]	Liu Y, Wang S, Li L. et al. The Cycas genome and the early evolution of seed plants. Nat Plants. 2022;8:389-401

[29]	Wei T, van Treuren R, Liu X. et al. Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce. Nat Genet. 2021;53:752-60

[30]	Zhou W, Yang T, Zeng L. et al. LettuceDB: an integrated multi-omics database for cultivated lettuce. Database. 2024;2024:baae018

[31]	Field D, Amaral-Zettler L, Cochrane G. et al. The Genomic Stan-dards Consortium. PLoS Biol. 2011;9:e1001088

[32]	Cochrane G, Karsch-Mizrachi I, Takagi T. et al. The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2016;44:D48-50

[33]	Field D, Garrity G, Gray T. et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol. 2008;26: 541-7

[34]	Yilmaz P, Kottmann R, Field D. et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol. 2011;29:415-20