A systematic survey of LU domain-containing proteins reveals a novel human gene, LY6A, which encodes the candidate ortholog of mouse Ly-6A/Sca-1 and is aberrantly expressed in pituitary tumors

Dan Liu , Chunhui Xu , Yanting Liu , Wen Ouyang , Shaojian Lin , Aining Xu , Yuanliang Zhang , Yinyin Xie , Qiuhua Huang , Weili Zhao , Zhu Chen , Lan Wang , Saijuan Chen , Jinyan Huang , Zhe Bao Wu , Xiaojian Sun

Front. Med. ›› 2023, Vol. 17 ›› Issue (3) : 458 -475.

PDF (11888KB)
Front. Med. ›› 2023, Vol. 17 ›› Issue (3) : 458 -475. DOI: 10.1007/s11684-022-0968-4
RESEARCH ARTICLE
RESEARCH ARTICLE

A systematic survey of LU domain-containing proteins reveals a novel human gene, LY6A, which encodes the candidate ortholog of mouse Ly-6A/Sca-1 and is aberrantly expressed in pituitary tumors

Author information +
History +
PDF (11888KB)

Abstract

The Ly-6 and uPAR (LU) domain-containing proteins represent a large family of cell-surface markers. In particular, mouse Ly-6A/Sca-1 is a widely used marker for various stem cells; however, its human ortholog is missing. In this study, based on a systematic survey and comparative genomic study of mouse and human LU domain-containing proteins, we identified a previously unannotated human gene encoding the candidate ortholog of mouse Ly-6A/Sca-1. This gene, hereby named LY6A, reversely overlaps with a lncRNA gene in the majority of exonic sequences. We found that LY6A is aberrantly expressed in pituitary tumors, but not in normal pituitary tissues, and may contribute to tumorigenesis. Similar to mouse Ly-6A/Sca-1, human LY6A is also upregulated by interferon, suggesting a conserved transcriptional regulatory mechanism between humans and mice. We cloned the full-length LY6A cDNA, whose encoded protein sequence, domain architecture, and exon‒intron structures are all well conserved with mouse Ly-6A/Sca-1. Ectopic expression of the LY6A protein in cells demonstrates that it acts the same as mouse Ly-6A/Sca-1 in their processing and glycosylphosphatidylinositol anchoring to the cell membrane. Collectively, these studies unveil a novel human gene encoding a candidate biomarker and provide an interesting model gene for studying gene regulatory and evolutionary mechanisms.

Keywords

LU domain-containing protein family / novel human gene / LY6A / pituitary tumor / biomarker / nonsynonymous SNP / GPI-anchored protein

Cite this article

Download citation ▾
Dan Liu, Chunhui Xu, Yanting Liu, Wen Ouyang, Shaojian Lin, Aining Xu, Yuanliang Zhang, Yinyin Xie, Qiuhua Huang, Weili Zhao, Zhu Chen, Lan Wang, Saijuan Chen, Jinyan Huang, Zhe Bao Wu, Xiaojian Sun. A systematic survey of LU domain-containing proteins reveals a novel human gene, LY6A, which encodes the candidate ortholog of mouse Ly-6A/Sca-1 and is aberrantly expressed in pituitary tumors. Front. Med., 2023, 17(3): 458-475 DOI:10.1007/s11684-022-0968-4

登录浏览全文

4963

注册一个新账户 忘记密码

1 Introduction

Since being discovered in the late 1970s [1,2], mouse lymphocyte antigen-6 (Ly-6) had been regarded as a prototype of the concept of “differentiation antigens,” which predicted that all cells/tissues in a single individual should possess, and could be distinguished by, different cell-surface antigens [3]. Interestingly, the complexity of the Ly-6 serological pattern implied the existence of “Ly-6 complex” antigens [4], and subsequent molecular and genomic studies indeed confirmed that these antigens are encoded by multiple genes located in clustered chromosomal regions [5,6]. Structurally, all Ly-6 antigens contain an evolutionarily conserved LU domain (named after Ly-6 and urokinase plasminogen activator receptor (uPAR)), which is characteristic of 10 conserved cysteine residues that form disulfide bridges and thereby create a “three-finger” structure [7]. In addition, many Ly-6 antigens also contain a glycosylphosphatidylinositol (GPI) anchor domain, thus being tethered to the cell membrane [8], whereas those lacking a GPI anchor domain may act as secreted proteins. According to the information provided by the Human Genome Organization Gene Nomenclature Committee (HGNC) and the Mouse Genomic Nomenclature Committee (MGNC), human and mouse genomes have 35 and 61 LU domain-containing genes, respectively [6] (updated information on human LU domain-containing genes is available on a specific page of the HGNC website).

Although the Ly-6 antigens were originally identified from lymphocytes, their expression turns out not to be restricted to a single tissue. Indeed, the founding member of the family, mouse Ly-6A (encoded by the Ly6a gene), has currently been widely known as stem cell antigen-1 (Sca-1) because it has been commonly used as a cell-surface marker of several types of tissue stem cells, including hematopoietic stem cells [9], mammary stem cells [10], prostate-regenerating cells [11], and lung bronchioalveolar stem cells [12], as well as various cancer stem/initiating cells (for review, see [13]). However, despite a considerable evolutionary conservation of the Ly-6 antigens between humans and mice, a human homolog(s) to the mouse Ly6a gene has been paradoxically missing [6]. In this regard, there are great mysteries why the mouse Ly-6A/Sca-1 has been selected (relative to its homologs) through evolution to be expressed in stem cells, whether humans (used to) have a homolog that may be similarly expressed in the corresponding types of cells, and how this gene(s) exerts biological functions in these cells.

In efforts to study the function of Ly-6A/Sca-1, two genetic Ly-6A/Sca-1 loss-of-function mouse models had been generated, and studies with both models showed that homozygous Ly-6A/Sca-1 null mice are essentially healthy in normal development [14,15], indicating that Ly-6A/Sca-1 is not required for the normal generation and functions of the cells that express it. Nevertheless, more careful phenotyping revealed that, under stress conditions (e.g., transplantation, immune stimulation, aging, injury), the Ly-6A/Sca-1 null mice exhibited notable, though sometimes controversial, phenotypes in several types of cells, including hematopoietic stem and progenitor cells [14,16,17], T cells [14,18,19], B cells [14,20], bone marrow stromal cells [21], and skeletal muscle cells [22,23]. These observations therefore open perspectives for understanding the potential function of Ly-6A/Sca-1 under stress and disease-like conditions. Indeed, studies with mouse models have led to new observations that Ly-6A/Sca-1 could be potentially associated with cancer-related mechanisms, such as the c-Kit [16,17] and TGF-β signaling pathways [24,25]. Thus, these findings imply that, although Ly-6A/Sca-1 seems to be nonessential for normal development, it may play an important role in tumorigenesis.

In this study, we performed an exhaustive survey of LU domains in mouse and human genomes and clarified a total of 130 LU domains which belong to 37 human and 64 mouse LU domain-containing genes. Among them, we discovered a previously unannotated human gene that likely encodes the ortholog of mouse Ly-6A/Sca-1, which was determined by the “reciprocal best hit” analysis. We cloned the full-length cDNA of this human LY6A gene from pituitary tumors, in which LY6A is aberrantly overexpressed compared with normal pituitary tissues and may contribute to tumorigenesis. Protein expression experiments validated that the LY6A protein can act the same as mouse Ly-6A/Sca-1 to be properly processed and tethered on the cell membrane through its GPI anchor domain. Interestingly, LY6A largely overlaps with, and is transcribed from the opposite strand of, a known lncRNA gene. The possible mutual regulation between these genes and technical issues in conventional transcriptome analysis may explain why this gene is rarely detected and thus has been “hidden” for decades.

2 Materials and methods

2.1 “Reciprocal best hit” analysis

The “reciprocal best hit” algorithm was used to search and define orthologs between different species as previously described [26,27]. In brief, by using the amino acid sequence of the mouse Ly-6A protein as a search query, a “tblastn” analysis was performed against the human genomic sequence. The matched human homologous genomic regions were used as queries to search the mouse nr protein database with “blastx”. Then, the retrieved and original amino acid sequences were compared with “blastp”. If they represented the same protein, the original mouse gene and the matched human homolog(s) were defined as orthologs. The GENSCAN program was used to predict exon/intron structures [28], which were further validated by mapping RNA-seq reads and molecular cloning.

2.2 Sequence alignment and phylogenetic analysis

Amino acid sequences of the LU domains (or the full-length Ly6a family proteins) were aligned with the ClustalX 2.1 program [29] with the BLOSUM series matrix; the end gap separation option was turned on. The resulting alignments were manually investigated using the BioEdit program. Based on the alignments, phylogenetic trees were constructed with the Molecular Evolutionary Genetics Analysis program (version 6) [30] using the neighbor-joining method with 1000 bootstrap replicates.

2.3 RNA-seq and data analyses

Pituitary tumor RNA-seq libraries were generated with mRNA purified with poly(T) oligos using the VAHTS Universal V6 RNA-seq Library Prep Kit for Illumina (Vazyme, NR604-01/02) and sequenced with the HiSeq X Ten or the Novaseq 6000 S4 platform (Illumina). Gene expression profiling of the pituitary tumor samples was performed as described, and our comprehensive analysis of the samples of 180 patients has recently been published [31]. The RNA-seq data of acute myeloid leukemia (AML) samples from patients and cell lines were derived from previously published studies [3235]. To validate the exon/intron structures, all the RNA-seq reads that were mapped to the genomic loci were retrieved, and each of them was manually examined to be exactly aligned to the genome by adhering to the “GT-AG” rule of the splice site pairs. This analysis allowed us to distinguish the correct transcripts from the opposite strands and exclude any contamination from genomic DNA or nonspecific transcripts. RNA-seq data were retrieved from the Encyclopedia of DNA Elements (ENCODE) project and the Genotype-Tissue Expression (GTEx) project to obtain a comprehensive profile of LY6A expression in different human tissues. The raw RNA-seq data were searched for LY6A-specific splicing junction reads with the “blastn” program using the sequences of exons 1–2 (5′-ATGGAGAGAGCTCAGGGTCT-3′) and exons 2–3 (5′-GTCTTTGGCAGTAAAGTGAG-3′) as queries. The numbers of junction reads were normalized to counts per million reads mapped. Gene set enrichment analysis and hierarchical cluster analysis of the RNA-seq data were performed as previously described [36].

2.4 RT-PCR analysis of gene expression

Total RNAs of various cells/tissues were extracted using MagZol reagent (Magen, R4801), and cDNA was synthesized with HiScript II Q Select RT SuperMix for qPCR (+ gDNA Wiper, Vazyme, R233). PCR was performed with 2× Taq Master Mix (Dye Plus, Vazyme, P112). A 341-bp fragment of the LY6A cDNA was amplified with a pair of primers (5′-CCTGCTGAGCATGGAGAGAG-3′ and 5′-AGAGGAAGACTGACCCCAGG-3′), and a 632-bp full-length LY6A cDNA was amplified with a pair of primers (5′-GTGCAGGACTCATTCGATCC-3′ and 5′-CACATCCTTCCCCTGATCTC-3′). The PCR products were analyzed with agarose gel electrophoresis and further validated by sequencing. As a loading control, GAPDH cDNA was amplified with the following pair of primers: 5′-GGAGCGAGATCCCTCCAAAAT-3′ and 5′-GGCTGTTGTCATACTTCTCATGG-3′.

2.5 Full-length cDNA cloning

A pituitary tumor sample was used for cloning the full-length LY6A cDNA. Total RNA was extracted with TRIzol reagent (Thermo Fisher Scientific, 15596018), and cDNA was synthesized with the PrimeScript RT Reagent Kit with gDNA Eraser (Takara, RR047A). PCR was performed with Ex Taq DNA Polymerase (Takara, RR001B). Full-length LY6A cDNA was amplified by PCR with the following primers: 5′-GTGCAGGACTCATTCGATCC-3′ and 5′-CACATCCTTCCCCTGATCTC-3′. The PCR product was purified with the Axygen AxyPrep PCR Clean-up Kit (Corning, AP-PCR-250), cloned into the T-Vector pMD19 (Takara, 3271), and subjected to sequencing. The sequence of the full-length cDNA was deposited in NCBI GenBank with accession number MW603613.

2.6 Induced expression by interferon-γ (IFN-γ)

Primary pituitary tumor cells were treated with recombinant human IFN-γ (10 ng/mL; Novoprotein, C014) for 24 h. RNA was extracted using MagZol reagent (Magen, R4801), and cDNA was synthesized by HiScript II Q Select RT SuperMix for qPCR (+ gDNA Wiper, Vazyme, R233). TB Green Premix Ex Taq II (Tli RNase H Plus, TaKaRa, RR820) was utilized to perform RT-qPCR with primers 5′-TGAAGACCTTGTCCCTGGTCC-3′ and 5′-GAGAGCTTGTTCTCCCCTCTCA-3′ on the Applied Biosystems ViiA 7 system. GAPDH was analyzed as the internal reference and for normalization with primers 5′-GGAGCGAGATCCCTCCAAAAT-3′ and 5′-GGCTGTTGTCATACTTCTCATGG-3′.

2.7 Protein expression and detection

The cDNAs encoding human LY6A (hLY6A) and mouse Ly-6A (mLy-6A), as well as those with deletion of the GPI anchor domain (ΔGPIa), were cloned into the pLVX-3 × FLAG–Halo vector. HEK293T cells in six-well plates were transfected with the indicated plasmids and cultured for 48 h. The culture medium was replaced with new medium containing 100 nmol/L HaloTag Alexa Fluor 660 Ligand (Promega, G8471) and incubated for 1 h. The cells were washed twice with PBS and lysed with RIPA buffer (Yoche Biotech, YSD0100) containing protease inhibitors (NCM Biotech, P001). The total protein concentrations of whole-cell lysates were analyzed with a BCA protein assay kit (Yoche Biotech, YSD-500T) and measured with a Biotek 800 TS microplate reader at the wavelength of 562 nm. Same amounts of the whole-cell lysates of different samples were loaded with SDS‒PAGE loading buffer (250 mmol/L Tris-HCl pH 6.8, 10% SDS, 50% glycerol, 0.05% bromophenol blue, and 5% β-mercaptoethanol), separated by SDS‒PAGE, and transferred to polyvinylidene difluoride membranes (Millipore, ISEQ10100). The fluorescent signals of the HaloTag Alexa Fluor 660 Ligand on the membranes were analyzed with a LI-COR Odyssey XF Infrared Imaging System. Then, the membranes were subjected to immunoblot analysis with the indicated antibodies and imaged using the Tanon 5200 Chemiluminescent Imaging System.

2.8 Generation of monoclonal stable cell lines

Monoclonal stable cell lines were generated with a lentiviral system. Briefly, HEK293T cells were transfected with lentiviral plasmids and the packaging plasmids, pMD2.G and pSPAX2, respectively. Virus-containing media were collected at 48 and 72 h after transfection, mixed, filtered through 0.45 μm nitrocellulose filters (Millipore, SLH-V033RS), and used to infect HEK293T cells in the presence of 5 μg/mL polybrene (Yeasen, 40804ES76). The infected cells were selected with 2 μg/mL puromycin (Beyotime, ST551). One-hundred cells were plated in 10 cm dish (Corning, 430167) and cultured for 10 days to obtain the monoclonal stable cell lines. When the single cell-derived clones were clearly visible in the dishes, individual clones were picked for expansion to generate monoclonal cell lines. Every cell line was validated for proper protein expression by immunoblot analysis.

2.9 Cell proliferation assays

Trypsin-digested cells were centrifuged and resuspended in culture medium, and cell numbers were counted by the Beckman Coulter Z2 cell counter. Three-thousand cells per well were seeded in 96-well plates, and cell viability was measured with a cell counting kit-8 (CCK-8) kit (APExBIO, K1018) after 0, 24, 48, 72, and 96 h. After addition of the CCK-8 solution, the plate was incubated at 37 °C for 1–4 h, and the absorbance at 450 nm was determined with the BioTek 800 TS microplate reader.

3 Results

3.1 Systematic survey and comparative genomic analysis of human and mouse LU domain-containing genes

A multiple-round, exhaustive “blast” search of human and mouse LU domains was performed against the nonredundant (nr) protein and genome databases to identify full-length proteins/genes and individual LU domains. First, for the annotated protein sequences, we identified 36 human and 64 mouse LU domain-containing proteins (Tab.1), which are more than those provided by the HGNC and MGNC databases. Second, many of these proteins contain multiple LU domains; thus, we researched the individual LU domains in these proteins by elaborate homology analysis. We recognized 129 unique LU domains, including many previously unrecognized domains (Tab.1 and Fig.1–1C). For example, homology analysis demonstrated a complicated arrangement of multiple LU domains in mouse and human CD177 proteins (Fig.1), and indeed, they were clarified to contain 8 and 4 LU domains, respectively (Fig.1). Third, we found several matched hits located in genomic regions (e.g., human chromosome 8q24.3) where no annotated protein-coding gene resides; this notion may suggest the existence of novel genes or pseudogenes that lost their open reading frames (ORFs) (see below for further analysis).

Because some of the LU domain-containing proteins have multiple LU domains and nonhomologous regions, they are not readily applicable for direct sequence alignment and evolutionary analysis. Therefore, we performed a phylogenetic analysis with all the individual LU domains (Fig.1). The results revealed that, while most of the human and mouse genes can form clear orthologous groups (though several gene expansion events occur within the groups), mouse Ly6a is classified in an exceptional mouse-specific cluster without a human ortholog (Fig.1). In the mouse genome, the Ly6a cluster resides in a well-defined region in chromosome 15qD3 that contains several other Ly-6 family genes, and this region is syntenic with a region of human chromosome 8q24.3 (Fig.2). Immediately adjacent to the mouse Ly6a cluster are Ly6e and Ly6l, both of which clearly have their human orthologs and, according to the phylogenetic relationships, neither could serve as a parental gene of the Ly6a cluster through evolution. Notably, however, in the human genomic region corresponding to the Ly6a cluster, there is a single annotated human-specific gene named C8orf31 (also known as LINC02904) (Fig.2), which is currently considered as a lncRNA gene and thus a possible regulator of gene expression [37].

3.2 Identification of the missing human LY6A gene

As our LU domain searching revealed several novel hits residing in the syntenic 8q24.3 region of the human genome, we set out to determine whether this region may harbor unannotated gene(s) homologous to the mouse Ly6a cluster genes and whether the human C8orf3/LINC02904 gene has any possible relevance. We used the “reciprocal best hit” approach [27,38] to reanalyze this ~63.5 kb region, followed by validation of the precise splice sites, molecular cloning, and structural analysis of the candidate homolog gene(s) (Fig.2). As a result, we first verified three regions of the human genomic sequence whose encoded amino acid sequences showed considerable homology with the LU domain of mouse Ly-6A, especially in the conserved cysteines (Fig.2). Two of these regions lying ~1.3 kb apart matched successive parts of the query Ly-6A protein, ranging from the very N-terminus of the LU domain to a part of the GPI anchor domain; the third region, being located ~17.4 kb away, matched the N-terminal part of the LU domain with relatively lower homology (Fig.2). In combination with GENSCAN exon‒intron prediction, the results suggest that two putative genes may be located within this region, and each of them contains notable potential exons. Interestingly, putative gene #1 resides in the C8orf3/LINC02904 gene locus but is transcribed from the opposite strand, which implies an unusual regulatory pattern.

We reasoned that the exon‒intron structures of these genes could be clarified by analyzing the sequences of specific reads in RNA-seq data. This approach could also provide convincing evidence for the expression of these genes. To this end, we analyzed our RNA-seq raw data of AML and pituitary tumor samples. From each sample, we first retrieved all the reads that were mapped to these loci and then performed a manual examination of how individual reads were exactly aligned to the genome (Fig.2). Notably, only the reads spanning spliced exon‒exon junctions were applicable for this purpose, because (1) these reads could exclude contaminations from genomic DNA or nonspecific transcripts and (2) their transcription directions could be determined by sticking to the “GT-AG” rule of the splice site pairs, which was important to distinguish the transcripts from opposite strands (Fig.2). As a result, we found that putative gene #1 was properly spliced in some pituitary tumors (though not in the AML samples), whereas putative gene #2 was not spliced in any sample. Furthermore, detailed analysis of the spliced reads revealed that, while the C8orf3/LINC02904 gene has 10 exons (including alternative exons) and produces multiple splicing isoforms, putative gene #1 contains 3 exons that jointly form a major ORF (Fig.2). Therefore, putative gene #1 was likely the only active Ly6a homologous gene within this region and was thus named human LY6A.

3.3 Full-length cloning, interferon induction, and gene expression profiling of LY6A

We then set out to clone the full-length cDNAs of LY6A from pituitary tumor samples whose RNA-seq data have already been validated for LY6A expression. Given that the three exons of LY6A largely overlap with those of C8orf3/LINC02904, we found that in many conditions, the PCR amplicons contained both LY6A and C8orf3/LINC02904 cDNAs, but they could eventually be distinguishable according to the splicing patterns. Only the use of LY6A-specific primers, which must be located in very limited regions (e.g., the 5′ end of exon 3 of LY6A), could produce specific LY6A cDNAs (Fig.2). This analysis showed that LY6A was expressed in pituitary tumors but not in AML cell lines (Fig.3). Full-length LY6A cDNA was cloned from a pituitary tumor, and the sequence was deposited in NCBI GenBank with accession number MW603613.

An important characteristic of the expression of mouse Ly6a is that it can be induced by IFNs [39,40]. We therefore tested whether human LY6A also possesses this characteristic. We found that treating pituitary tumor cells with IFN-γ indeed remarkably increased LY6A expression (Fig.3). Data mining and RT-PCR validation showed that LY6A expression could also be upregulated by IFN-γ in other cell types, such as the stomach carcinoma cell line MKN-1 (data not shown). These results indicate the conserved transcriptional regulatory mechanism of LY6A between mice and humans.

Direct sequencing of the LY6A-specific PCR amplicons validated the splice sites (exons 1–2 and 2–3) of LY6A (Fig.3). Using these splice junction site sequences as queries, we were able to obtain an overall expression profile of LY6A in searching the ENCODE [41] and GTEx databases [42]. The results showed that LY6A has relatively higher expression in the spleen, testis, and subtypes of adipose tissues (Fig.3), which appeared distinct from the expression patterns of other LU domain-containing genes in human tissues (Fig.3). These patterns were validated by our RT-PCR analysis of several collected human tissues (Fig.3 and data not shown). Furthermore, we obtained a publicly available data set of RNA-seq analysis of human spleen tissues [43], which includes the sequencing results of multiple samples of the total RNA, polyA RNA, and small RNA of spleen tissue, splenic endothelial cells, and splenic fibroblasts. Our reanalysis of this data set revealed that LY6A-specific reads were detected in the total RNA and polyA RNA (but not small RNA) of the total spleen tissue. By contrast, no LY6A-specific read was detected in the samples of splenic endothelial cells and fibroblasts (Fig.3). Therefore, these results further support the notion that the LY6A gene can indeed be actively transcribed into polyadenylated mRNA within spleen tissue, whereas the absence of LY6A mRNA from splenic endothelial cells and fibroblasts may imply cell-type specificity.

3.4 Aberrant LY6A overexpression in pituitary tumors and its potential relevance to tumor growth advantage

The gene expression profiles suggest that LY6A might have very low or no expression in normal pituitary tissues (Fig.3). This result was confirmed by our RT-PCR experiments, in which, for example, LY6A could be clearly amplified from the spleen and a subtype of adipose tissue but was not detectable in two pituitary tissue samples (Fig.3). However, LY6A was found to be remarkably expressed in many pituitary tumors (Fig.2 and data not shown). In particular, our RNA-seq analysis of pituitary tumor samples revealed that 3 out of 5 patients with Cushing’s disease had relatively high LY6A expression levels (Fig.4). Importantly, gene set enrichment analysis and unsupervised hierarchical cluster analyses of these tumor samples showed that tumors with high LY6A expression possess transcriptomic features of higher energy metabolism, proliferation, and neuron synapses and lower innate immune response (Fig.4 and 4C). This aberrant LY6A overexpression in human pituitary tumors is consistent with a recent study with a mouse pituitary tumor model, in which Ly-6A/Sca-1+ cells showed tumor growth advantages [44]. Thus, these findings collectively suggest the possible functional contributions of LY6A to tumorigenesis.

3.5 Evolutionary conservation of the LY6A-encoded protein sequence, domain architecture, and exon‒intron structure and identification of a high-frequency nonsynonymous coding single-nucleotide polymorphism (SNP)

We then set out to analyze whether LY6A encodes a functional protein. Alignment of the encoded protein by LY6A with those of the mouse Ly6a cluster showed substantial conservation among these family members, including the conserved cysteines and the three functional domains (i.e., signal peptide, LU domain, and GPIa domain; Fig.5). Moreover, the residue positions corresponding to the exon‒exon junctions of human LY6A are also well conserved compared with all the mouse Ly6a cluster genes (Fig.5). Furthermore, the phylogenetic analysis of the LY6A protein showed that it is clearly classified into a clade of the Ly6a cluster (Fig.5). Therefore, in combination with the abovementioned “reciprocal best hit” analysis and the conserved IFN induction of expression, these results collectively suggest that the human LY6A gene may represent not only a common ancestor of the mouse Ly6a cluster genes, but also the gene encoding the candidate ortholog of mouse Ly-6A/Sca-1 protein.

Interestingly, in our patients with pituitary tumor, we found a genetic variation of LY6A: two alleles of LY6A differ in a single nucleotide that changes the amino acid residue 112 from glycine (G) to serine (S) (Fig.5 and 5C). Although this variation has been annotated as a noncoding SNP (accession number rs11136300) in dbSNP, it should now be defined as a nonsynonymous coding SNP according to its ability to change the encoded amino acid sequence. Moreover, the two alleles of this SNP have relatively high frequencies (overall, 64% of the “G” allele and 36% of the “S” allele), and their frequencies seem to be remarkably different between races of people (Fig.5 and data not shown), implying the potential role of LY6A in human evolution.

3.6 Proper expression, processing, and membrane tethering of the LY6A protein

Biosynthesis of GPI-anchored proteins requires complicated, tightly regulated mechanisms, including transmembrane, signal peptide cleavage, posttranslational modification, and processing in the endoplasmic reticulum. We expressed LY6A in HEK293T cells with insertions of two tags, namely, a 3×FLAG tag for detecting the total protein and a HaloTag for detecting its membrane tethering (as the HaloTag could be labeled by a cell-impermeant, fluorescent ligand), to validate its proper protein expression, processing, and GPI anchoring outside of the cell membrane (Fig.6). Mouse Ly6a was used as the positive control. The results of immunoblot and fluorescent imaging analyses showed that the human and mouse proteins can be properly expressed, processed, and tethered to the membrane and that their membrane tethering is definitely dependent on their GPIa domains (Fig.6 and 6B). To evaluate the effect of the enforced LY6A expression on cell proliferation, we generated multiple monoclonal HEK293T cell lines that stably express LY6A and used cell lines expressing the empty vector gene or mouse Ly6a as the controls (Fig.6). As a result, the cell lines expressing human LY6A exhibited slightly increased proliferation compared with the empty vector control cell lines (Fig.6). Notably, the HEK293T cell lines expressing mouse Ly6a did not show this increase in proliferation probably because of the interspecies differences between humans and mice. Together, the results indicate that the human LY6A gene identified herein is complete and intact in its gene/protein structure; its tightly regulated transcription, protein biosynthesis, and trafficking; and its residing in the proper place to exert its biological functions.

4 Discussion

In this study, we identified and cloned the long-missing human LY6A gene. Concrete evidence of the existence and identity of this gene includes (1) the syntenic relationship of the gene clusters on human and mouse chromosomes; (2) the shared IFN-inducible expression; (3) the similar exon‒intron structures with precise splice consensus sequences; (4) the high homology of the encoded proteins, especially the conserved functional domains/residues; (5) the well-supported phylogenetic relationships; and (6) the proper protein expression, processing, and membrane-tethering. It is surprising that a novel human gene encoding the candidate ortholog of mouse Ly-6A/Sca-1 was discovered over 20 years after the complete sequencing of the human genome [45,46] and over 40 years after the first identification of the mouse Ly-6 antigens with great interest [1,2]. This notion may reflect that our understanding of the human genome and its function and regulation is still far from complete. Indeed, the present analysis of LY6A has raised several interesting questions on the evolution, expression, and function of the Ly-6 family genes, as well as some technical problems in the widely used transcriptome analysis technologies, which may miss some “hidden” genes.

The evolutionary origin of LY6A was investigated by the “reciprocal best hit” algorithm, a definitive method for the prediction of one-to-one orthologs [38]. The results showed that human LY6A can only be reciprocally matched with mouse Ly6a but not with any other Ly-6 family members, thus defining the one-to-one orthologous relationship between the human LY6A and mouse Ly6a genes. Notably, in combination with our phylogenetic analysis, these findings suggest two interesting notions: (1) the human LY6A gene represents a common ancestor of a cluster of expanded mouse homologous genes, and within this cluster, the mouse Ly6a gene seems to be the most evolutionarily conserved (thus possibly the “oldest”) member; and (2) the tendency that the “older” Ly6a and its “younger” homologs play their roles in stem cells versus differentiated progeny cells, respectively, may reflect the “Evo-Devo” discipline as a coordinative programming between evolution and development [47]. These notions thus imply the possible evolutionary history of the Ly-6 family genes that could be relevant to their functions.

Notably, compared with the overall high-level evolutionary conservation between humans and mice, the Ly-6 family genes are relatively divergent. This divergence is reflected by the existence of different gene numbers in humans and mice (37 vs. 64; including the human LY6A identified herein), as well as their possibly different expression patterns. The underlying mechanistic prerequisites for these phenomena may include the following: (1) the emergence of many of these genes during evolution was actually posterior to the specification of the cells that currently express them; and (2) these genes individually might not be essential for the survival and basic functions of the cells. These prerequisites may allow the (re)distribution or functional replacement of these genes in different cell types through evolution and thus lead to divergence in gene numbers and expression patterns.

How is LY6A expression regulated? Based on current knowledge, the expression specificity of Ly-6 family genes falls into two categories: cell/tissue specificity and cytokine inducibility. Notably, mechanistic analysis of the cis-regulatory elements of the mouse Ly6a gene showed that the DNase I hypersensitive sites responsible for IFN induction largely overlap with those for its natural expression in specific cells [48]. Therefore, the conserved IFN-induced expression of LY6A may suggest that the human and mouse genes share transcriptional regulatory mechanisms and that an evolutionary analysis of these mechanisms will be helpful to understand how the specific expression patterns of these genes are regulated.

Another interesting question is how the dual genes (i.e., LY6A and C8orf3/LINC02904) originate and cooperatively evolve. Considering the long evolutionary history of the Ly-6 gene family, LY6A likely emerged much earlier than C8orf3/LINC02904. Subsequently, C8orf3/LINC02904 seems to dominate their competition, whereas LY6A still doggedly survives. However, despite such competition, they must withstand selective pressure together, because their opposite overlapping pattern indicates that many nucleotides of the “noncoding” C8orf3/LINC02904 are in fact selected based on the “coding” LY6A [49]. Indeed, although thousands of overlapping gene pairs (including those on the same and opposite strands) have been observed in mammalian genomes, a gene pair overlapping with the majority of their exonic sequences is rare [50]. Thus, the identification of the human LY6A gene provides a rare and interesting model gene for evolutionary studies.

A unique feature of the human LY6A gene is that it is well “hidden” within the noncoding RNA gene C8orf3/LINC02904, and this situation may provide possible explanations for why human LY6A has been missed for decades: from a mechanistic perspective, LY6A expression may be specifically repressed by C8orf3/LINC02904 at the transcriptional and posttranscriptional levels [51], and, technically, the short reads produced by next-generation sequencing may be misreported as the transcripts of C8orf3/LINC02904 but not the currently unannotated LY6A. Indeed, our analyses clearly indicate that only a spliced read that spans an intron ended by splice donor and acceptor sites can be defined as a LY6A transcript, whereas the nonspliced reads cannot be defined correctly at all; therefore, using a regular standard (e.g., FPKM, RPKM) to evaluate the expression levels of LY6A or C8orf3/LINC02904 is not reliable.

Lessons from this study reflect several technical issues in transcriptome analysis technologies that may lead to data misinterpretation and missing “hidden” genes. First, because the current method for sequence alignment usually does not consider the “GT-AG” rule of the splice sites, some reads could be split incorrectly, leading to the misalignment of exons. Therefore, the findings suggest that sequence alignment methods could be optimized at least in some settings to involve the algorithms that recognize splicing site pairs (as well as other gene features), as did the GENSCAN program for accurate gene prediction [28]. Second, for genomic regions containing overlapping genes (which are not rare if considering many noncoding transcripts), unsplit reads should be used with caution for evaluating gene expression levels because distinguishing which genes they are derived from is impossible. In this situation, what is suggested is to use only split reads that fit the “GT-AG” rule or to consider full-length sequencing approaches as optional methods. Third, because the majority of the exonic regions of the LY6A and C8orf3/LINC02904 mRNAs overlap, they can form a stable double-stranded RNA “duplex”. It has been known that stable double-stranded or highly structured RNAs may affect the efficiency of cDNA synthesis (e.g., through reverse transcriptase drop-off), adapter ligation, and PCR and thus affect the analysis of the quantity and integrity of RNAs in regular experimental conditions [52]. Therefore, optimized methods need to be developed to address these problems and to achieve a faithful profile of the vast variety of RNAs in cells [53,54].

Finally, the identification of human LY6A may provide a new biomarker for cancer. As many Ly-6 family genes show highly specific expression patterns in a cellular differentiation stage- and lineage-specific manner [6], they are commonly used as cell-surface markers for the isolation of specific cell populations [13]. In cancer, abnormal expression patterns of these genes are frequently observed and associated with prognosis, but the underlying mechanisms remain unclear [55]. In this regard, the identification of human LY6A may provide a new biomarker for these purposes. Although LY6A was not detectable in normal pituitary tissues, its expression in pituitary tumors and apparent correlation with tumor-promoting transcriptomic features suggest that LY6A could be repressed in normal pituitary tissues but derepressed through a certain mechanism, and might play a functional role, during tumorigenesis.

References

[1]

Feeney AJ, Hämmerling U. Ala-1: a murine alloantigen of activated lymphocytes. Immunogenetics 1976; 3(1): 369–379

[2]

Woody JN, Feldmann M, Beverley PC, McKenzie IF. Expression of alloantigens LY-5 and LY-6 on cytotoxic effector cells. J Immunol 1977; 118(5): 1739–1743

[3]

Boyse EA, Old LJ. Some aspects of normal and abnormal cell surface genetics. Annu Rev Genet 1969; 3(1): 269–290

[4]

Houlden BA, Hogarth PM, McKenzie IF. Interrelationships of the “Ly-6 complex” antigens. Immunogenetics 1986; 23(4): 226–232

[5]

LeClair KP, Palfree RG, Flood PM, Hammerling U, Bothwell A. Isolation of a murine Ly-6 cDNA reveals a new multigene family. EMBO J 1986; 5(12): 3227–3234

[6]

Loughner CL, Bruford EA, McAndrews MS, Delp EE, Swamynathan S, Swamynathan SK. Organization, evolution and functions of the human and mouse Ly6/uPAR family genes. Hum Genomics 2016; 10(1): 10

[7]

Gumley TP, McKenzie IF, Sandrin MS. Tissue expression, structure and function of the murine Ly-6 family of molecules. Immunol Cell Biol 1995; 73(4): 277–296

[8]

Reiser H, Oettgen H, Yeh ET, Terhorst C, Low MG, Benacerraf B, Rock KL. Structural characterization of the TAP molecule: a phosphatidylinositol-linked glycoprotein distinct from the T cell receptor/T3 complex and Thy-1. Cell 1986; 47(3): 365–370

[9]

Spangrude GJ, Heimfeld S, Weissman IL. Purification and characterization of mouse hematopoietic stem cells. Science 1988; 241(4861): 58–62

[10]

Welm BE, Tepera SB, Venezia T, Graubert TA, Rosen JM, Goodell MA. Sca-1(pos) cells in the mouse mammary gland represent an enriched progenitor cell population. Dev Biol 2002; 245(1): 42–56

[11]

Xin L, Lawson DA, Witte ON. The Sca-1 cell surface marker enriches for a prostate-regenerating cell subpopulation that can initiate prostate tumorigenesis. Proc Natl Acad Sci USA 2005; 102(19): 6942–6947

[12]

Kim CF, Jackson EL, Woolfenden AE, Lawrence S, Babar I, Vogel S, Crowley D, Bronson RT, Jacks T. Identification of bronchioalveolar stem cells in normal lung and lung cancer. Cell 2005; 121(6): 823–835

[13]

Holmes C, Stanford WL. Concise review: stem cell antigen-1: expression, function, and enigma. Stem Cells 2007; 25(6): 1339–1347

[14]

Stanford WL, Haque S, Alexander R, Liu X, Latour AM, Snodgrass HR, Koller BH, Flood PM. Altered proliferative response by T lymphocytes of Ly-6A (Sca-1) null mice. J Exp Med 1997; 186(5): 705–717

[15]

Hanson P, Mathews V, Marrus SH, Graubert TA. Enhanced green fluorescent protein targeted to the Sca-1 (Ly-6A) locus in transgenic mice results in efficient marking of hematopoietic stem cells in vivo. Exp Hematol 2003; 31(2): 159–167

[16]

Ito CY, Li CY, Bernstein A, Dick JE, Stanford WL. Hematopoietic stem cell and progenitor defects in Sca-1/Ly-6A-null mice. Blood 2003; 101(2): 517–523

[17]

Bradfute SB, Graubert TA, Goodell MA. Roles of Sca-1 in hematopoietic stem/progenitor cell function. Exp Hematol 2005; 33(7): 836–843

[18]

Whitmire JK, Eam B, Whitton JL. Mice deficient in stem cell antigen-1 (Sca1, Ly-6A/E) develop normal primary and memory CD4+ and CD8+ T-cell responses to virus infection. Eur J Immunol 2009; 39(6): 1494–1504

[19]

Zhang ZX, Stanford WL, Zhang L. Ly-6A is critical for the function of double negative regulatory T cells. Eur J Immunol 2002; 32(6): 1584–1592

[20]

Jones MA, DeWolf S, Vacharathit V, Yim M, Spencer S, Bamezai AK. Investigating B cell development, natural and primary antibody responses in Ly-6A/Sca-1 deficient mice. PLoS One 2016; 11(6): e0157271

[21]

Bonyadi M, Waldman SD, Liu D, Aubin JE, Grynpas MD, Stanford WL. Mesenchymal progenitor self-renewal deficiency leads to age-dependent osteoporosis in Sca-1/Ly-6A null mice. Proc Natl Acad Sci USA 2003; 100(10): 5840–5845

[22]

Mitchell PO, Mills T, O’Connor RS, Kline ER, Graubert T, Dzierzak E, Pavlath GK. Sca-1 negatively regulates proliferation and differentiation of muscle cells. Dev Biol 2005; 283(1): 240–252

[23]

Epting CL, López JE, Pedersen A, Brown C, Spitz P, Ursell PC, Bernstein HS. Stem cell antigen-1 regulates the tempo of muscle repair through effects on proliferation of alpha7 integrin-expressing myoblasts. Exp Cell Res 2008; 314(5): 1125–1135

[24]

Upadhyay G, Yin Y, Yuan H, Li X, Derynck R, Glazer RI. Stem cell antigen-1 enhances tumorigenicity by disruption of growth differentiation factor-10 (GDF10)-dependent TGF-β signaling. Proc Natl Acad Sci USA 2011; 108(19): 7820–7825

[25]

Long KK, Montano M, Pavlath GK. Sca-1 is negatively regulated by TGF-β1 in myogenic cells. FASEB J 2011; 25(4): 1156–1165

[26]

Song HD, Sun XJ, Deng M, Zhang GW, Zhou Y, Wu XY, Sheng Y, Chen Y, Ruan Z, Jiang CL, Fan HY, Zon LI, Kanki JP, Liu TX, Look AT, Chen Z. Hematopoietic gene expression profile in zebrafish kidney marrow. Proc Natl Acad Sci USA 2004; 101(46): 16240–16245

[27]

Sun XJ, Xu PF, Zhou T, Hu M, Fu CT, Zhang Y, Jin Y, Chen Y, Chen SJ, Huang QH, Liu TX, Chen Z. Genome-wide survey and developmental expression mapping of zebrafish SET domain-containing genes. PLoS One 2008; 3(1): e1499

[28]

Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J Mol Biol 1997; 268(1): 78–94

[29]

Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. Clustal W and Clustal X version 2.0. Bioinformatics 2007; 23(21): 2947–2948

[30]

Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 2013; 30(12): 2725–2729

[31]

Yao H, Xie W, Dai Y, Liu Y, Gu W, Li J, Wu L, Xie J, Rui W, Ren B, Xue L, Cheng Y, Lin S, Li C, Tang H, Wang Y, Lou M, Zhang X, Hu R, Shang H, Huang J, Wu ZB. TRIM65 determines the fate of a novel subtype of pituitary neuroendocrine tumors via ubiquitination and degradation of TPIT. Neuro-oncol 2022; 24(8): 1286–1297

[32]

Sun XJ, Wang Z, Wang L, Jiang Y, Kost N, Soong TD, Chen WY, Tang Z, Nakadai T, Elemento O, Fischle W, Melnick A, Patel DJ, Nimer SD, Roeder RG. A stable transcription factor complex nucleated by oligomeric AML1-ETO controls leukaemogenesis. Nature 2013; 500(7460): 93–97

[33]

Liu N, Song J, Xie Y, Wang XL, Rong B, Man N, Zhang MM, Zhang Q, Gao FF, Du MR, Zhang Y, Shen J, Xu CH, Hu CL, Wu JC, Liu P, Zhang YL, Xie YY, Liu P, Huang JY, Huang QH, Lan F, Shen S, Nimer SD, Chen Z, Chen SJ, Roeder RG, Wang L, Sun XJ. Different roles of E proteins in t(8;21) leukemia: E2-2 compromises the function of AETFC and negatively regulates leukemogenesis. Proc Natl Acad Sci USA 2019; 116(3): 890–899

[34]

Zhang MM, Liu N, Zhang YL, Rong B, Wang XL, Xu CH, Xie YY, Shen S, Zhu J, Nimer SD, Chen Z, Chen SJ, Roeder RG, Lan F, Wang L, Huang QH, Sun XJ. Destabilization of AETFC through C/EBPα-mediated repression of LYL1 contributes to t(8;21) leukemic cell differentiation. Leukemia 2019; 33(7): 1822–1827

[35]

Zhang YF, Wang XL, Xu CH, Liu N, Zhang L, Zhang YM, Xie YY, Zhang YL, Huang QH, Wang L, Chen Z, Chen SJ, Roeder RG, Shen S, Xue K, Sun XJ. A direct comparison between AML1-ETO and ETO2-GLIS2 leukemia fusion proteins reveals context-dependent binding and regulation of target genes and opposite functions in cell differentiation. Front Cell Dev Biol 2022; 10: 992714

[36]

Zhang F, Zeng QY, Xu H, Xu AN, Liu DJ, Li NZ, Chen Y, Jin Y, Xu CH, Feng CZ, Zhang YL, Liu D, Liu N, Xie YY, Yu SH, Yuan H, Xue K, Shi JY, Liu TX, Xu PF, Zhao WL, Zhou Y, Wang L, Huang QH, Chen Z, Chen SJ, Zhou XL, Sun XJ. Selective and competitive functions of the AAR and UPR pathways in stress-induced angiogenesis. Cell Discov 2021; 7(1): 98

[37]

Rinn JL, Chang HY. Genome regulation by long noncoding RNAs. Annu Rev Biochem 2012; 81(1): 145–166

[38]

Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet 2005; 39(1): 309–338

[39]

Dumont FJ, Coker LZ. Interferon-α/β enhances the expression of Ly-6 antigens on T cells in vivo and in vitro. Eur J Immunol 1986; 16(7): 735–740

[40]

Dumont FJ, Boltz RC. The augmentation of surface Ly-6A/E molecules in activated T cells is mediated by endogenous interferon-gamma. J Immunol 1987; 139(12): 4088–4095

[41]

Luo Y, Hitz BC, Gabdank I, Hilton JA, Kagda MS, Lam B, Myers Z, Sud P, Jou J, Lin K, Baymuradov UK, Graham K, Litton C, Miyasato SR, Strattan JS, Jolanki O, Lee JW, Tanaka FY, Adenekan P, O’Neill E, Cherry JM. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res 2020; 48(D1): D882–D889

[42]

GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 2020; 369(6509): 1318–1330

[43]

Lorenzi L, Chiu HS, Avila Cobos F, Gross S, Volders PJ, Cannoodt R, Nuytens J, Vanderheyden K, Anckaert J, Lefever S, Tay AP, de Bony EJ, Trypsteen W, Gysens F, Vromman M, Goovaerts T, Hansen TB, Kuersten S, Nijs N, Taghon T, Vermaelen K, Bracke KR, Saeys Y, De Meyer T, Deshpande NP, Anande G, Chen TW, Wilkins MR, Unnikrishnan A, De Preter K, Kjems J, Koster J, Schroth GP, Vandesompele J, Sumazin P, Mestdagh P. The RNA Atlas expands the catalog of human non-coding RNAs. Nat Biotechnol 2021; 39(11): 1453–1465

[44]

Donangelo I, Ren SG, Eigler T, Svendsen C, Melmed S. Sca1+ murine pituitary adenoma cells show tumor-growth advantage. Endocr Relat Cancer 2014; 21(2): 203–216

[45]

Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J; International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 2001; 409(6822): 860–921

[46]

Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigó R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Deslattes Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X. The sequence of the human genome. Science 2001; 291(5507): 1304–1351

[47]

Raff RA. Evo-devo: the evolution of a new discipline. Nat Rev Genet 2000; 1(1): 74–79

[48]

Sinclair AM, Dzierzak EA. Cloning of the complete Ly-6E.1 gene and identification of DNase I hypersensitive sites corresponding to expression in hematopoietic cells. Blood 1993; 82(10): 3052–3062

[49]

McLysaght A, Guerzoni D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos Trans R Soc Lond B Biol Sci 2015; 370(1678): 20140332

[50]

Adelman JP, Bond CT, Douglass J, Herbert E. Two mammalian genes transcribed from opposite strands of the same DNA locus. Science 1987; 235(4795): 1514–1517

[51]

Katayama S, Tomaru Y, Kasukawa T, Waki K, Nakanishi M, Nakamura M, Nishida H, Yap CC, Suzuki M, Kawai J, Suzuki H, Carninci P, Hayashizaki Y, Wells C, Frith M, Ravasi T, Pang KC, Hallinan J, Mattick J, Hume DA, Lipovich L, Batalov S, Engström PG, Mizuno Y, Faghihi MA, Sandelin A, Chalk AM, Mottagui-Tabar S, Liang Z, Lenhard B, Wahlestedt C; RIKEN Genome Exploration Research Group; Genome Science Group (Genome Network Project Core Group); FANTOM Consortium. Antisense transcription in the mammalian transcriptome. Science 2005; 309(5740): 1564–1566

[52]

Strobel EJ, Yu AM, Lucks JB. High-throughput determination of RNA structures. Nat Rev Genet 2018; 19(10): 615–634

[53]

Hu JF, Yim D, Ma D, Huber SM, Davis N, Bacusmo JM, Vermeulen S, Zhou J, Begley TJ, DeMott MS, Levine SS, de Crécy-Lagard V, Dedon PC, Cao B. Quantitative mapping of the cellular small RNA landscape with AQRNA-seq. Nat Biotechnol 2021; 39(8): 978–988

[54]

Kong Y, Hu H, Shan Y, Zhou Z, Zen K, Sun Y, Yang R, Fu Z, Chen X. Accurate quantification of 3′-terminal 2′-O-methylated small RNAs by utilizing oxidative deep sequencing and stem-loop RT-qPCR. Front Med 2022; 16(2): 240–250

[55]

Upadhyay G. Emerging role of lymphocyte antigen-6 family of genes in cancer and immune cells. Front Immunol 2019; 10: 819

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (11888KB)

4070

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/