Human pangenome: far-reaching implications in precision medicine

Yingyan Yu , Hongzhuan Chen

Front. Med. ›› 2024, Vol. 18 ›› Issue (2) : 403 -409.

PDF (882KB)
Front. Med. ›› 2024, Vol. 18 ›› Issue (2) : 403 -409. DOI: 10.1007/s11684-023-1039-1
COMMENT

Human pangenome: far-reaching implications in precision medicine

Author information +
History +
PDF (882KB)

Cite this article

Download citation ▾
Yingyan Yu, Hongzhuan Chen. Human pangenome: far-reaching implications in precision medicine. Front. Med., 2024, 18(2): 403-409 DOI:10.1007/s11684-023-1039-1

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

On May 11, 2023, the internationally renowned journal Nature announced that a first draft of the human pangenome reference has been constructed [1]. This research was carried out by the Human Pangenome Reference Consortium (HPRC). One month later, on June 14, 2023, a pangenome reference of 36 Chinese populations was published on journal Nature [2], which was performed by Chinese Pangenome Consortium (CPC). As the scientists pointed out that since the Human Genome Project (HGP) published the human reference genome, it provided a reference blueprint for genomic research. However, the limitations of the human reference genome have gradually emerged, and it is difficult to reflect the genome diversity of the “human” species. On January 2010, Chinese scientists first built the sequence map of the human pangenome [3]. The genetic materials of the human pangenome draft were collected from 47 individuals reflecting different genetic diversity, which is more complete than the single human reference genome released 20 years ago. However, this human pangenome draft still belongs to the initial achievements of the HPRC. The ultimate goal of this academic team is to complete the genome research of no less than 350 individuals that embody the genetic diversity. Meanwhile, the CPC released a collection of 116 high-quality and haplotype-phased de novo assemblies based on 58 core samples representing 36 ethnic minority groups from China.

2 From the discovery of DNA double-helix to the human genome project

Whenever we mention the human genome, the discovery of the DNA double-helix cannot be avoided. This year just coincides with the 70th anniversary of the discovery of the DNA double-helix. On April 25, 1953, the Nature magazine published three papers on the double-helix structure of DNA at the same time, from the teams of Watson and Crick, Wilkins, as well as Franklin and Gosling [46]. Among them, Watson and Crick proposed in a short paper with only one illustration that DNA is arranged as a double-helix structure, which contains the genetic material capable of replicating. The publication of the DNA double-helix opened the way of the next 50 years for human beings to decipher the genetic codes. In fact, at the same time that Watson and Crick were analyzing the structure of DNA at the University of Cambridge, Wilkins and Franklin’s team at King’s College in London were also working on analyzing DNA. The memoirs of Watson and Crick and other historical files show that Franklin’s Laboratory took an X-ray crystalline diagram of DNA in May 1952, which is the famous “photo 51 of B form DNA” in history [7]. In January 1953, Wilkins of the Franklin’s team showed a copy of the photo to Watson and Crick. The two realized at the time that the “X” shape in the photo meant that DNA was a helix, and possibly a double helix! In February 1953, Watson and Crick successfully constructed the structure model of DNA double-helix, and invited Franklin to the laboratory of Cambridge University to visit the newly constructed DNA double-helix model. In 1962, Watson, Crick and Wilkins shared the Nobel Prize in Physiology or Medicine [8]. The discovery of the DNA double-helix structure has been hailed as “a symbol of biology, and lead a new era.” It has become a milestone in the history of science comparable to Darwin’s theory of evolution.

In 1977, Sanger and his colleagues at the University of Cambridge developed a method for detecting DNA sequences (the first-generation sequencing method), also known as chain termination DNA sequencing, which made it possible for the first time to read complete DNA sequence information [9]. It is precisely because of the first-generation sequencing technology that scientists from nearly 20 genome centers of six countries including the United States, Britain, China, France, Germany and Japan launched the HGP in 1990s [10]. Chinese scientists contributed one percent of the work to this great project [11,12]. At the initial stage of HGP, genome sequencing technology was not well developed, and relatively expensive. Scientists completed the sequencing of the first human reference genome sequence in April 2003 after more than ten years’ hard works [13]. The HGP is known as the three great projects in the history of human science alongside the Manhattan Project and the Apollo Moon Project [14].

However, with the rapid development of scientific research, traditional Sanger sequencing can no longer meet the demands of genomic research. To carry out gene sequencing analysis on larger genomes such as animals and humans, there is an urgent need for sequencing with higher throughput, faster speed and lower cost. In 2005, the next-generation sequencing technology (the second-generation sequencing) came into being [15]. With the promotion and application of the second-generation sequencing, the research on human disease genome and cancer genome has truly entered a period of rapid development [16].

The human reference genome constructed by the HGP in 2003 actually only completed more than 90% of the sequence of the human genome [17]. Owing to the limitation of the first-generation sequencing technology at that time, there were still hundreds of sequence gaps waiting to be filled [18,19]. The advent of the second-generation sequencing technology reduced the cost of sequencing, and greatly improved the efficiency, while maintaining a high accuracy of whole-genome sequencing (WGS), resulting in a blowout growth in amount of data related to WGS (including disease genome or cancer genome). The accumulation of big data on the human genome has given scientists the opportunity to discover the DNA sequences that do not exist in the published reference genomes (often referred to as unmapped sequences). Researchers have gradually realized that it is difficult to fully reflect the genetic diversity of whole human beings by using one reference genome as a research reference [20].

3 The origin and development of pangenome

Pangenome is a collection of DNA sequences that reflects the variation of genetic material among different individuals of a species, or the gene pool of all individuals of the species. When referring to the pangenome, it is necessary to first understand three basic elements: core genes, distributed genes (dispensable genes or variable genes), and population-specific genes. The core genes are genes shared by all individuals of a species. Taking humans as an example, it is the genes that everyone has. Therefore, the core genes dominate the basic biological functions of a species and determine the main phenotypic characteristics of the species. Distributed genes, also known as non-essential genes, are present in some individuals but absent in others. Population-specific genes are genes only present in individuals of a certain ethnic group and absent in individuals of other ethnic groups. Distributed genes and population-specific genes are not essential genes for the basic life activities of a species, but they may be involved in the regulation of secondary metabolism or differences in response to environmental pressure, which may determine or increase the survival advantage of the species [20]. Therefore, if traditional genomic analysis helps scientists found many genetic mutations, pangenome research could find the missing genetic components, large structural variants (SVs), and even find the novel presence-absence variations (PAVs) of genes (Fig.1).

The “pan” of pangenome is derived from the ancient Greek language, which means “whole.” This concept was first appeared in the report of the Streptococcus agalactiae genome in 2005. In the case of S. agalactiae, each bacteria strain may have around 2000 genes, but the complete genome of S. agalactiae can be integrated to several thousands of genes [21]. There are many variants in bacterial genomes. A single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and limits genome-wide screens for vaccine candidates or for antimicrobial targets. Pangenome can be used to analyze the genetic diversity of bacterial strains/virus strains, and gain an in-depth insight of the relationship between each strain’s pathogenicity, virulence, and drug resistance. Thereafter, scientists gradually extend the pangenomic study in to plant genome, animal genome, and human genome [22]. On April 25, 2018, Nature published the paper about pangenome analysis of 3010 Asian cultivated rice. The Wei’s team from Shanghai Jiao Tong University participated in this work [23]. Rice is an important crop for human being. In the past, the selection of strains for improving yield and tolerance to harsh environments mainly relied on experience and repeated large-scale combinatorial hybridization. However, when the pangenome of 3010 rices has been analyzed, the required strain resources can be precisely extracted for hybridization. For example, using seeds resources that can tolerate flooding in the early growth period and resistant to drought in the late growth period to hybridize will significantly increase the success rate of rice with the desired traits. On January 2010, Chinese scientists integrated the de novo assembly of an Asian and an African genomes with the NCBI reference human genome, as a step toward constructing the human pan-genome. They identified ~5 Mb of novel sequences not present in the reference genome in each of these assemblies, and proposed that extensive amount of novel sequence contributing to the genetic variation of the pan-genome [3]. Considering the human genome, the amount of sequencing data is far greater than that of microbial (bacterial or viral) genomes and plant genomes [24, 25]. Without breakthroughs in methodology, it will be difficult to carry out human pangenome research in a large scale.

4 The technical development of human pangenome and applications in cancer research

Although the ever-increasing human WGS data provides opportunities for pangenome analysis, there are challenges regarding to the huge data amount of WGS for human genome. For example, the size of deep sequencing data of WGS at sequencing depth of 30-fold for one person is as huge as 90 Gb. The purpose of pangenome research is to analyze data on as many individuals as possible. It places high demands on the analytic method for both hardware and software. In terms of hardware, high-performance computer clusters and large-capacity server are required. Regarding to software, pipelines for automated analysis of the human pangenome need to be developed. At current stage, there are two kinds of analysis strategy, linear pangenome and graphic pangenome [26]. As early as 5 years ago, a joint research team from Shanghai Jiaotong University and Shanghai University of Traditional Chinese Medicine, including clinical oncologists, innovative drug researcher and bioinformatics stuff, led by Yu, Zhu, Chen, and Wei, developed an automated pangenomic analysis pipeline named HUman Pan-genome ANalysis (HUPAN) tool. This pipeline relies on the high-performance computer clusters in Shanghai Jiaotong University, which is suitable for analyzing human WGS data in a large scale. The related work has been published on Genome Biology on July 31, 2019, an authoritative international journal on genomics [27]. This work was awarded the prize of “2019 top ten algorithms and tools for bioinformatics in China” because of its significant role in promoting human pangenomic study. On the other hand, several pipelines for construction of graphic pangenome have been published [28,29]. Graphic pangenome is the compact representation of a set of genome sequences, in which, the similar sequences are compacted to common nodes, and the variations are presented as separate nodes [30]. The draft of human pangenome reference by HPRC based on graphic pangenomic method and covered WGS data from 47 samples of worldwide populations, but with East Asian population samples underrepresented. The draft added 119 Mb euchromatic polymorphic sequences and 1115 gene duplications relative to the existing reference GRCh38. Roughly 90 Mb additional base pairs are derived from structural variation [1]. Recently, the pangenome reference of 36 Chinese populations by CPC was released, which was composed of 116 haplotype-phased de novo assemblies based on 58 samples representing 36 ethnic minority groups from China. The CPC Phase I data added 189 Mb euchromatic polymorphic sequences and 1367 protein-coding gene duplications to GRCh38, and demonstrated a great potential to shed new light on human evolution and recover missing heritability in complex trait and disease mapping [2]. Scientists from the HPRC pointed out that taking the human pangenome as a reference to carry out WGS research will lead to a more accurate discovery of genetic variation, especially larger structural variation, and benefit all areas of human genetics research [1,31]. Thereafter, identifying genetic variants associated with human disease will be more sensitive and specific, directly improving the disease diagnosis and treatment.

Several years ago, the multidisciplinary team from Shanghai carried out the pangenomic research on the WGS data of gastric cancer using the powerful HUPAN, which is an automatic pipeline for linear pangenomic analysis. The related work of “pangenomic analysis of Chinese gastric cancer” was published on journal Nature Communications in September 15, 2022 [32]. As we know, gastric cancer has seriously endangered the life and health of East Asians including Chinese [33]. Whether this distinct racial predisposition is associated with specific genomic variants has remained a mystery. Over the years, Chinese scientists have realized that the pathogenesis of gastric cancer is the result of the interaction between environmental factors and genetic susceptibility. External pathogenic factors such as Helicobacter pylori (Hp), Epstein-Barr virus and dietary habit may be involved in the carcinogenesis of gastric cancer [34,35]. However, little is known about the role of genomic variation in the initiating and development of gastric cancer. The reason has a lot to do with the lack of effective experimental technologies. The automated HUPAN developed by Shanghai scientists has solved the technical problem [36]. After analyzing the WGS data from 185 pairs of gastric cancer and gastric mucosa tissues (370 samples) in Han Chinese, the scientists built the first pangenome of human gastric cancer, covering the human reference genome (GRCh38) and unmapped new sequences of 80.88 Mb. This study found a group of distributed genes (GSTM1, ACOT1, SIGLEC14 and UGT2B17) with high frequency of deletion variations, also known as PAVs in Chinese gastric cancer population. By a comparative analysis of WGS data from different ethnic groups in public databases, the absent frequency of these four genes in gastric cancer populations ranged from 41% to 71%, which was much higher than that of Western descendants (4.6%–46%). The new discovery of PAVs in these distributed genes revealed a possible genetic basis of the high incidence of gastric cancer in the Han Chinese population, and thus provided a directional reference for targeted intervention therapy in clinical practice.

It is worth noting that the unmapped sequences found in pangenomic analysis are valuable genetic resources for further exploration. During the in-depth analysis of the 80.88 Mb unmapped sequences in gastric pangenome, the joint team from Shanghai predicted at least 14 new genes that were missed in the reference genome. In view of the inaccurate gene annotation caused by the short reads produced from the second-generation sequencing technology in pangenome, the third-generation sequencing technology that came out after 2011 has the advantage of longer sequencing reads [3740]. The scientists used the third-generation sequencing data to map new genes on chromosomes and successfully mapped the gene GC0643 to the 9q34.2 locus, and further explored biological functions of GC0643 gene in vitro, and found that this gene significantly inhibited the growth, migration, invasion, and cell cycle progression, and promoted the apoptosis of cancer cells. The gene GC0643 has been certified by the NCBI database (GenBank: MW194843.1) [32].

5 Perspective of the coming genomic era for everyone

The development of genome sequencing technologies in the past 70 years has allowed us to know about 20 000 genes that make us human. However, each person is a unique individual, and the uniqueness of each person lies in the small differences in the genome. According to the pangenomic draft by HPRC, the genome difference between people accounts for about 0.4% of the whole genome [1]. With the gradual maturity of human pangenome analytical technology and the reducing cost of third-generation sequencing, the probability of discovering long-sequence genomic variations (namely the gene variations of “presence” or “absence”) will increase significantly. The pangenomic examination will lead to a paradigm shift in the diagnosis and treatment of diseases and cancers in the future.

In addition, Chinese scientists have noticed that among the distributed genes with high frequency of absence found in the pangenomic analysis of gastric cancer population from the Han Chinese, some of variations are shared with that of archaic hominin genomes detected by Swedish scientist Svante Pääbo (the winner of 2022 Nobel Prize in Physiology or Medicine). For example, the gene SIGLEC14 is significantly deleted in the East Asians, which is a functional regulatory gene of innate immune cells [41,42]. When this gene is deleted, the individual will lack sufficient immunity to encounter with some pathogen infections. The immune cells of carriers are difficult to produce cytokines such as tumor necrosis factor α [4345]. As to another gene ACOT1, which is significantly deleted in the Han Chinese, encodes acyl-CoA thioesterase 1. This gene is mainly expressed in adipose tissue and participates in the processing of long-chain fatty acids and very long-chain fatty acids [46,47]. These two genes share genetic variation characteristics between gastric cancer population with archaic hominin genomes. This unexpected discovery suggests that by comparing the pangenomic characteristics of different human races, we may discover the trajectory of genome variation in the evolution process of human being out of Africa, and further decipher some mysteries that have plagued the medical community for many years. The genetic elements associated to diseases or cancer by pangenomic analysis will promote the development of therapeutic approaches that target these objects. Pangenomic findings can help us understand the racial differences in the incidence of certain diseases (or cancers), and the underlying mechanism why the same therapeutic drugs are effective in patients from some races, but resistant in others. Therefore, studies on disease genome or cancer genome based on pangenomic reference will assist doctors in development of better therapies and promote precision medicine. As human beings enter the genomic era, the disease diagnosis and treatment will reference the genomic information in near future.

References

[1]

Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, Buonaiuto S, Chang XH, Cheng H, Chu J, Colonna V, Eizenga JM, Feng X, Fischer C, Fulton RS, Garg S, Groza C, Guarracino A, Harvey WT, Heumos S, Howe K, Jain M, Lu TY, Markello C, Martin FJ, Mitchell MW, Munson KM, Mwaniki MN, Novak AM, Olsen HE, Pesout T, Porubsky D, Prins P, Sibbesen JA, Sirén J, Tomlinson C, Villani F, Vollger MR, Antonacci-Fulton LL, Baid G, Baker CA, Belyaeva A, Billis K, Carroll A, Chang PC, Cody S, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld AL, Formenti G, Frankish A, Gao Y, Garrison NA, Giron CG, Green RE, Haggerty L, Hoekzema K, Hourlier T, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson ND, Popejoy AB, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Smith MW, Sofia HJ, Abou Tayoun AN, Thibaud-Nissen F, Tricomi FF, Wagner J, Walenz B, Wood JMD, Zimin AV, Bourque G, Chaisson MJP, Flicek P, Phillippy AM, Zook JM, Eichler EE, Haussler D, Wang T, Jarvis ED, Miga KH, Garrison E, Marschall T, Hall IM, Li H, Paten B. A draft human pangenome reference. Nature 2023; 617(7960): 312–324

[2]

Gao Y, Yang X, Chen H, Tan X, Yang Z, Deng L, Wang B, Kong S, Li S, Cui Y, Lei C, Wang Y, Pan Y, Ma S, Sun H, Zhao X, Shi Y, Yang Z, Wu D, Wu S, Zhao X, Shi B, Jin L, Hu Z, Lu Y, Chu J, Ye K, Xu S. A pangenome reference of 36 Chinese populations. Nature 2023; 619(7968): 112–121

[3]

Li R, Li Y, Zheng H, Luo R, Zhu H, Li Q, Qian W, Ren Y, Tian G, Li J, Zhou G, Zhu X, Wu H, Qin J, Jin X, Li D, Cao H, Hu X, Blanche H, Cann H, Zhang X, Li S, Bolund L, Kristiansen K, Yang H, Wang J, Wang J. Building the sequence map of the human pan-genome. Nat Biotechnol 2010; 28(1): 57–63

[4]

Watson JD, Crick FH. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 1953; 171(4356): 737–738

[5]

Wilkins MH, Stokes AR, Wilson HR. Molecular structure of deoxypentose nucleic acids. Nature 1953; 171(4356): 738–740

[6]

Franklin RE, Gosling RG. Molecular configuration in sodium thymonucleate. Nature 1953; 171(4356): 740–741

[7]

Attar N. Raymond Gosling: the man who crystallized genes. Genome Biol 2013; 14(4): 402

[8]

Edsall JT. Nobel Prize: two Britons, American share 1962 Award for Genetic Code Achievement. Science 1962; 138(3539): 498–500

[9]

Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 1977; 74(12): 5463–5467

[10]

Olson MV. The human genome project. Proc Natl Acad Sci USA 1993; 90(10): 4338–4344

[11]

Chen Z, Zhang S. Chinese Human Genome Project-opportunity and challenge. Chin J Med Genet (Zhonghua YiXue YiChuanXue ZaZhi) 1998; 15(4): 195–197

[12]

Han ZG, Zhao GP, Chen Z. Transcriptome study in China. C R Biol 2003; 326(10–11): 949–957

[13]

Collins FS, Morgan M, Patrinos A. The Human Genome Project: lessons from large-scale biology. Science 2003; 300(5617): 286–290

[14]

Garver KL, Garver B. The Human Genome Project and eugenic concerns. Am J Hum Genet 1994; 54(1): 148–158

[15]

Jarvie T. Next generation sequencing technologies. Drug Discov Today Technol 2005; 2(3): 255–260

[16]

Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 2016; 17(6): 333–351

[17]

International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 2004; 431(7011): 931–945

[18]

Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT, Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin CS, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 2017; 27(5): 849–864

[19]

Audano PA, Sulovari A, Graves-Lindsay TA, Cantsilieris S, Sorensen M, Welch AE, Dougherty ML, Nelson BJ, Shah A, Dutcher SK, Warren WC, Magrini V, McGrath SD, Li YI, Wilson RK, Eichler EE. Characterizing the major structural variant alleles of the human genome. Cell 2019; 176(3): 663–675.e19

[20]

Yang X, Lee WP, Ye K, Lee C. One reference genome is not enough. Genome Biol 2019; 20(1): 104

[21]

Golicz AA, Bayer PE, Bhalla PL, Batley J, Edwards D. Pangenomics comes of age: from bacteria to plant and animal applications. Trends Genet 2020; 36(2): 132–145

[22]

Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, Deboy RT, Davidsen TM, Mora M, Scarselli M, Margarit y Ros I, Peterson JD, Hauser CR, Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N, Khouri H, Radune D, Dimitrov G, Watkins K, O’Connor KJ, Smith S, Utterback TR, White O, Rubens CE, Grandi G, Madoff LC, Kasper DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci USA 2005; 102(39): 13950–13955

[23]

Wang W, Mauleon R, Hu Z, Chebotarov D, Tai S, Wu Z, Li M, Zheng T, Fuentes RR, Zhang F, Mansueto L, Copetti D, Sanciangco M, Palis KC, Xu J, Sun C, Fu B, Zhang H, Gao Y, Zhao X, Shen F, Cui X, Yu H, Li Z, Chen M, Detras J, Zhou Y, Zhang X, Zhao Y, Kudrna D, Wang C, Li R, Jia B, Lu J, He X, Dong Z, Xu J, Li Y, Wang M, Shi J, Li J, Zhang D, Lee S, Hu W, Poliakov A, Dubchak I, Ulat VJ, Borja FN, Mendoza JR, Ali J, Li J, Gao Q, Niu Y, Yue Z, Naredo MEB, Talag J, Wang X, Li J, Fang X, Yin Y, Glaszmann JC, Zhang J, Li J, Hamilton RS, Wing RA, Ruan J, Zhang G, Wei C, Alexandrov N, McNally KL, Li Z, Leung H. Genomic variation in 3010 diverse accessions of Asian cultivated rice. Nature 2018; 557(7703): 43–49

[24]

Gong Y, Li Y, Liu X, Ma Y, Jiang L. A review of the pangenome: how it affects our understanding of genomic variation, selection and breeding in domestic animals?. J Anim Sci Biotechnol 2023; 14(1): 73

[25]

Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigó R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Deslattes Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X. The sequence of the human genome. Science 2001; 291(5507): 1304–1351

[26]

Llamas B, Narzisi G, Schneider V, Audano PA, Biederstedt E, Blauvelt L, Bradbury P, Chang X, Chin CS, Fungtammasan A, Clarke WE, Cleary A, Ebler J, Eizenga J, Sibbesen JA, Markello CJ, Garrison E, Garg S, Hickey G, Lazo GR, Lin MF, Mahmoud M, Marschall T, Minkin I, Monlong J, Musunuri RL, Sagayaradj S, Novak AM, Rautiainen M, Regier A, Sedlazeck FJ, Siren J, Souilmi Y, Wagner J, Wrightsman T, Yokoyama TT, Zeng Q, Zook JM, Paten B, Busby B. A strategy for building and using a human reference pangenome. F1000 Res 2019; 8: 1751

[27]

Duan Z, Qiao Y, Lu J, Lu H, Zhang W, Yan F, Sun C, Hu Z, Zhang Z, Li G, Chen H, Xiang Z, Zhu Z, Zhao H, Yu Y, Wei C. HUPAN: a pan-genome analysis pipeline for human genomes. Genome Biol 2019; 20(1): 149

[28]

Li H, Feng X, Chu C. The design and construction of reference pangenome graphs with minigraph. Genome Biol 2020; 21(1): 265

[29]

Eizenga JM, Novak AM, Sibbesen JA, Heumos S, Ghaffaari A, Hickey G, Chang X, Seaman JD, Rounthwaite R, Ebler J, Rautiainen M, Garg S, Paten B, Marschall T, Sirén J, Garrison E. Pangenome graphs. Annu Rev Genomics Hum Genet 2020; 21(1): 139–162

[30]

Garg S, Balboa R, Kuja J. Chromosome-scale haplotype-resolved pangenomics. Trends Genet 2022; 38(11): 1103–1107

[31]

Massarat A, Gymrek M, McStay B, Jónsson H. Human pangenome supports analysis of complex genomic regions. Nature 2023; 617(7960): 256–258

[32]

Yu Y, Zhang Z, Dong X, Yang R, Duan Z, Xiang Z, Li J, Li G, Yan F, Xue H, Jiao D, Lu J, Lu H, Zhang W, Wei Y, Fan S, Li J, Jia J, Zhang J, Ji J, Liu P, Lu H, Zhao H, Chen S, Wei C, Chen H, Zhu Z. Pangenomic analysis of Chinese gastric cancer. Nat Commun 2022; 13(1): 5412

[33]

Yoshida T, Yatabe Y, Kato K, Ishii G, Hamada A, Mano H, Sunami K, Yamamoto N, Kohno T. The evolution of cancer genomic medicine in Japan and the role of the National Cancer Center Japan. Cancer Biol Med 2023; 3: j.issn.2095-3941.2023.0036

[34]

Stewart OA, Wu F, Chen Y. The role of gastric microbiota in gastric cancer. Gut Microbes 2020; 11(5): 1220–1230

[35]

Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 2014; 513(7517): 202–209

[36]

Yu Y, Wei C. A powerful HUPAN on a pan-genome study: significance and perspectives. Cancer Biol Med 2020; 17(1): 1–5

[37]

Watson CM, Crinnion LA, Simmonds J, Camm N, Adlard J, Bonthron DT. Long-read nanopore sequencing enables accurate confirmation of a recurrent PMS2 insertion-deletion variant located in a region of complex genomic architecture. Cancer Genet 2021; 256–257: 122–126

[38]

Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet 2011; 52(4): 413–435

[39]

Venkatesan BM, Bashir R. Nanopore sensors for nucleic acid analysis. Nat Nanotechnol 2011; 6(10): 615–624

[40]

Ozsolak F. Third-generation sequencing techniques and applications to drug discovery. Expert Opin Drug Discov 2012; 7(3): 231–243

[41]

Tsai CM, Riestra AM, Ali SR, Fong JJ, Liu JZ, Hughes G, Varki A, Nizet V. Siglec-14 enhances NLRP3-inflammasome activation in macrophages. J Innate Immun 2020; 12(4): 333–343

[42]

Angata T, Hayakawa T, Yamanaka M, Varki A, Nakamura M. Discovery of Siglec-14, a novel sialic acid receptor undergoing concerted evolution with Siglec-5 in primates. FASEB J 2006; 20(12): 1964–1973

[43]

Yamanaka M, Kato Y, Angata T, Narimatsu H. Deletion polymorphism of SIGLEC14 and its functional implications. Glycobiology 2009; 19(8): 841–846

[44]

Varki A. Colloquium paper: uniquely human evolution of sialic acid genetics and biology. Proc Natl Acad Sci USA 2010; 107(Suppl 2): 8939–8946

[45]

Yu Y, Peng W. Recent progress in targeting the sialylated glycan-SIGLEC axis in cancer immunotherapy. Cancer Biol Med 2023; 20(5): 369–384

[46]

Lin YL, Pavlidis P, Karakoc E, Ajay J, Gokcumen O. The evolution and functional impact of human deletion variants shared with archaic hominin genomes. Mol Biol Evol 2015; 32(4): 1008–1019

[47]

Cavalli M, Diamanti K, Dang Y, Xing P, Pan G, Chen X, Wadelius C. The thioesterase ACOT1 as a regulator of lipid metabolism in type 2 diabetes detected in a multi-omics study of human liver. OMICS 2021; 25(10): 652–659

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (882KB)

3124

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/