Introduction
Epstein-Barr virus (EBV) is a member of the γ-herpes viruses that infects roughly 95% of adult individuals worldwide; this virus is associated with Burkitt’s lymphoma (BL), Hodgkin’s disease, B and T cell lymphomas, gastric carcinoma, and nasopharyngeal carcinoma (NPC) [
1]. EBV undergoes two distinct life cycles: the latent cycle and the lytic cycle, which are characterized by the expression of different genes in a human host. Four types of the latent form exist on the basis of the expression of various latency genes, namely, types 0, I, II and III. The latent-expressed genes include six EBV nuclear antigens (EBNAs), three latent membrane proteins (LMPs),
Bam HI A rightward transcripts (
BARTs), and small non-polyadenylated non-coding RNAs (
EBER1 and
EBER2) [
1–
3].
The undifferentiated form of NPC is commonly found in the southern Chinese population [
4–
6]. This form shows the most consistent association with EBV. EBV is primarily maintained in type II latency in NPC and thus expresses
EBER1,
EBER2,
EBNA1,
LMP1,
LMP2, and
BARTs [
7–
9]. However, some lytic genes, such as
BZLF1,
BRLF1,
BARF1,
BHRF1,
BALF2,
LF1,
LF2,
BILF1,
BALF3,
BALF4,
BALF5, and
BGLF5, have been occasionally detected in NPC [
10–
12]. Therefore, the expression pattern of EBV genes in NPC seems more complex than previously expected.
Several assays have been used to detect EBV transcripts [
13,
14]. Microarrays and transcriptome sequencing are two primary methods to perform gene expression profiling analysis. Microarrays have served as the basic technical platform of most gene expression analyses because they are stable, practical, and feasible. However, microarrays are limited in terms of their sensitivity, specificity, and ability to identify novel transcripts [
15]. RNA-seq is a massively parallel sequencing technology that remarkably increases the throughput of RNA sequencing and allows global measurements of transcript abundance [
16]. Deep sequencing technology has been applied to analyze the EBV miRNA transcriptome in clinical NPC tissue and has revealed a wide expression range of EBV miRNA, including novel miRNAs [
17,
18]. RNA-seq assay have also provided advantages to detect the expression of EBV genes in BL cells [
13] and gastric cancer cells [
19].
We initially performed massively parallel sequencing to systematically and comprehensively characterize the expression of EBV genes in NPC tissue and NPC cell line (C666-1) that consistently harbors EBV and to enhance our understanding of the roles of EBV genes in the pathogenesis of NPC [
20,
21]. The expression of EBV genes in different EBV-positive cell lines and in NPC tissue was validated through real-time PCR.
Materials and methods
Ethics statement
This study was approved by the Sun Yat-sen University Institute Research Ethics Committee, and written informed consent was provided by each human subject.
Cell lines
C666-1 is a NPC cell line that consistently carries EBV. NPEC2 is an EBV-negative primary normal nasopharyngeal epithelial cell. The NPC cell line CNE2 and the BL cell line Akata(−) are EBV negative; by contrast, B95.8, Raji, Akata(+), and LCL-1 lymphoblastoid cell lines are EBV positive. B95.8 cell line, an LCL of marmoset origin, can express latent and lytic EBV genes constitutively and concomitantly with an EBV type III latency pattern. Raji cell line was derived from BL and was found to exhibit an EBV type I/III latency pattern; these cells initially display an EBV type I latency pattern similar to that of BL cells
ex vivo but switch to a type III latency pattern upon continued
in vitro culture [
21,
22]. Without induction, Akata(+) was observed as a type I latency cell line, and LCL-1 was found as a type III latency cell line. Total RNA was extracted with TRIzol in accordance with the manufacturer’s protocol (Invitrogen, USA).
Tissue samples
Twelve primary NPC tissue samples were obtained and subjected to RNA-seq assay; ten additional NPC samples and two normal control groups were collected and further validated in the Department of Nasopharyngeal Carcinoma, Sun Yat-sen University Cancer Center, Guangzhou, China. Informed consent was obtained from the patients, and approval was provided by the Institute Research Ethics Committee. All of the NPC patients were treated in the Sun Yat-sen University Cancer Center and identified through clinical and pathological diagnosis. The detailed clinical data of the 12 NPC samples for RNA-seq are shown in Supplemental Table 1. Total RNA and DNA from the samples were isolated by using commercial RNA/DNA extraction kits (Omega, USA).
cDNA library preparation and sequencing
The quality of the total RNA was assessed by using an Agilent Bioanalyzer 2100 (Agilent Technologies, USA). In brief, 5 µg of high-quality total RNA with an RNA integrity number (RIN) greater than 7 was used as input. The poly(A) mRNA was isolated by using beads with oligo(dT). Paired-end libraries (n = 14) comprising NPEC2, C666-1, and 12 NPC tissue samples were prepared in accordance with the manufacturer’s protocol by using an mRNA-seq sample prep kit (Illumina, USA). The purified mRNA was fragmented into 200–700 nt strands in a fragmentation buffer. With these short fragments as the templates, a random hexamer primer was used to synthesize the first-strand cDNA. The second-strand cDNA was synthesized in a reaction containing the buffer, dNTPs, RNase H, and DNA polymerase I. Short double-stranded cDNA fragments were purified with a QIAQuick PCR extraction kit (Qiagen, Germany) and eluted with EB buffer for end repair and “A” addition. Subsequently, the short fragments were ligated to Illumina sequencing adaptors. Approximately 200 bp long DNA fragments were gel-purified and amplified through PCR. The amplified library was sequenced through paired-end sequencing by using Illumina HiSeqTM 2000.
Raw reads filtering
The images generated by sequencers were converted into nucleotide sequences by the base calling pipeline. Raw reads were saved in the fastq format. The removal of dirty raw reads was necessary before data were analyzed. Three criteria were used to filter dirty raw reads:
(1) Remove reads with sequence adaptors.
(2) Remove reads with more than 5% “N” bases.
(3) Remove low quality reads, which contain more than 50% QA≤20 bases.
All of the subsequent analyses were based on clean reads.
Sequence data sets
Sequences were aligned to the human genome sequence (hg18 reference genome: NCBI Build 36.1; http://genome.ucsc.edu/cgi-bin/hgGateway) and NCBI Refseq transcripts. If one or both of the paired-end reads could not be mapped to the human genome and transcripts, these reads were classified as unmapped. The unmapped reads were further mapped to the reference EBV wild-type genome (GenBank: AJ507799.2; http://www.ncbi.nlm.nih.gov/nuccore/AJ507799.2). To quantify the expression of EBV genes, we counted the reads that mapped uniquely to the gene. Read counts were normalized on the basis of the length of the gene and the total number of mappable reads according to the RPKM (Reads Per Kilobase of exon model per Million mapped reads) normalization method.
Normalization of gene expression levels
Reads that could be uniquely mapped to a gene were used to calculate the expression levels. The gene expression level was determined as the number of uniquely mapped reads per kb of exon region per million mappable reads (RPKM) according to the following equation:
where C is the number of reads uniquely mapped to the given gene, N is the number of reads uniquely mapped to all genes, and L is the total length of exons from the given gene. For genes with more than one alternative transcript, the longest transcript was selected to calculate the RPKM. The RPKM method could eliminate the influence of different gene lengths and sequencing discrepancy on the calculation of gene expression. Therefore, the RPKM value can be directly used to compare the differences in gene expression among the samples.
cDNA preparation for real-time PCR
RNA samples were treated with RNase-free DNase I (#EN0521; Fermentas, Lithuania) in accordance with the manufacturer’s protocol to remove any DNA contaminant. The DNA-free RNA samples were used as templates to synthesize the first-strand cDNA with Oligo(dT)20 (50 mmol/L) and reverse transcriptase (Invitrogen, Carlsbad, CA). Random hexamer primers were used to synthesize the first-strand cDNA and thus measure the expression of two small non-polyadenylated (non-coding) RNAs, namely, EBER1 and EBER2.
Real-time PCR validation
Quantitative PCR was performed with Power SYBR Green qPCR SuperMix-UDG (Invitrogen, Carlsbad, CA) and aCFX96TM Real-time PCR detection system (Bio-Rad Laboratories, USA), as previously described. The geometric mean of the housekeeping gene GAPDH was used as an internal control to normalize the variability in expression levels. The sequences of the primers used to amplify the indicated genes were designed with Primer Premier 5 (Supplemental Table 2).
Relative expression of EBV genes in NPC tissue
Total RNA and DNA were extracted from the tissue samples. RNA samples were used to verify EBV gene expression. DNA samples were used to determine the relative amount of EBV DNA. The relative expression levels of EBV genes were normalized to the amount of EBV DNA; thus, the potential discrepancies between different samples because of different amounts of cancer cells in tumor masses could be eliminated. The relative expression levels of EBV genes were calculated, as follows:
(1) The relative amount of the EBV genes on DNA (RE1) was normalized to β-globin.
(2) The relative expression of the EBV genes as RNA (RE2) was normalized to GAPDH.
(3) The final relative expression of EBV genes (RE) was calculated as RE= RE2/RE1.
Three independent experiments were performed, and similar results were obtained. Data were presented as mean±SD (n = 3).
Unsupervised hierarchical clustering
The cell line sample (C666-1) and 12 NPC samples from patients (NPC_41, NPC_46, NPC_49, NPC_51, NPC_52, NPC_53, NPC_54, NPC_57, NPC_60, NPC_62, NPC_64, and NPC_66) were subjected to clustering analysis. A total of 68 EBV genes were expressed in at least one sample and used for clustering. For each gene in all of the samples, the RPKM of all genes was added to 1 and then divided by the average gene expression across the samples. The fold change of the expression was subsequently log transformed. For each gene, the logarithmic gene expression was normalized with a mean of 0 and a standard deviation of 1 by row (samples). The normalized expression data were clustered by using the hierarchical clustering algorithm. The mean was compared. A red signal indicates increased EBV gene expression and a green signal corresponds to decreased EBV gene expression.
Differential cellular gene expression patterns associated with EBV
With unsupervised hierarchical clustering, we can divide the 12 NPC samples into two major subtypes on the basis of the EBV gene expression data: the first subtype comprised NPC_49, NPC_52, and NPC_66; the second subtype consisted of the 9 other samples. To investigate the influences of EBV-dependent alterations in tumor signaling pathways, we identified the genes with significantly different expression between these groups by conducting a t-test (P<0.05).
Gene ontology and KEGG pathway enrichment analysis
As a comprehensive set of functional annotation tools, the Database for Annotation, Visualization and Integrated Discovery (DAVID v6.7) [
23] has been used to systematically and integratively analyze lists of differently expressed genes. Gene ontology (GO) terms [
24] are significantly overrepresented in a set of genes from three aspects, including cellular component, molecular function, and biological process. In our work, the identification of significant GO biological process terms and the KEGG pathway enrichment analyses of the identified DEGs were performed using DAVID with thresholds of
P< 0.05 and enrichment gene counts of>2.
Statistical analysis
Pearson correlation coefficient was calculated by using SPSS 16.0 to assess the correlation between the transcript levels of EBV lytic genes and BZLF1 or BRLF1 and to identify a list of genes that were significantly related to BZLF1 and/or BRLF1 (P<0.05).
Pearson correlation coefficient was also determined by using SPSS 16.0 to assess the correlation between the transcript levels of EBV genes valued through RPKM and the relative expression through qRT-PCR in C666-1.
Results
Analysis of EBV gene expression via paired-end transcriptome sequencing
In this study, massive parallel sequencing was performed to assess the transcriptomes of 1 NPC cell line C666-1 (EBV-positive NPC cells), 1 primary cell line NPEC2 (EBV-negative normal nasopharyngeal epithelial cells), and 12 NPC tissue samples. We generated an average of 74 435 071 reads ranging from 52557678 to 85510964 and 6.57 Gb ranging from 3.94 to 7.70 Gb of the sequenced nucleotides per sample. The short oligonucleotide analysis package (SOAP) aligner was used to map an average of 82.22% (range: 77.70% – 87.22%) of the reads to the human genome (UCSC version hg18). The residual unmapped read-pairs were aligned to the EBV genome (the complete wild-type genome of human herpesvirus 4; GenBank: AJ507799.2). For accuracy, we tallied only unambiguous alignments with the best match to the genome accession. On average, 4757 reads (varied from 2317 to 9484) from C666-1 and NPC tissue were aligned to the EBV genome. An average of 31 EBV genes (varied from 16 to 68) were detected, whose abundance was valued by RPKM as a measure of relative gene expression (Supplemental Data 1). We also performed RNA-seq of the data of the EBV-negative cell line NPEC2. None of the reads were matched to the EBV genome. Therefore, the observed expression values of the EBV genes are not caused by the background.
EBV latency gene expression is not restricted to type II latency in some NPC samples
The RNA-seq coverage information of EBV genes was visualized by the distribution of the reads across the EBV genome in C666-1 (Fig. 1B) and NPC tissue (Fig. 1C–1D). Among the NPC tissue, NPC_49, NPC_52, and NPC_66 yielded the highest coverage of the EBV genome (Fig. 1C). The abundance of the reads mapped to the BART region was higher than that in the other region in all of the samples. Similar to the findings of a previous report [
13], our results revealed few intergenic and intron regions; thus, the major reads were not derived from DNA contamination.
Consistent with the results of previous studies [
7–
9,
25], our findings demonstrated that the EBV type II latency genes
EBNA1, BARTs (
BARF0,
A73, and
RPMS1),
LMP1, and
LMP2 were highly expressed in NPC cell line C666-1 and most of the NPC tissue (Fig. 2A; the sequencing libraries were generated from polyA-selected RNA, which precludes the sequencing of
EBER genes). The transcripts of
EBNA2 and the
EBNA3s, which are mainly detected in type III latency, were expressed in C666-1 and several NPC samples (Fig. 2B). Hence, the latency gene expression in some NPC samples might not be restricted to type II latency.
EBV lytic gene expression in NPC
In addition, we observed the expression of lytic genes (Fig. 2B and 2C). Most genes were expressed at levels above the median RPKM values (median RPKM= 3279 in all sequencing data; Supplemental Data 1, Supplemental Tables 3–16); this finding suggested that EBV lytic genes are expressed in NPC, and this result is similar to a recent RNA-seq report on EBV in BL cells [
13]. As previously mentioned, the partial expression of EBV productive cycle-associated lytic genes was previously reported to occur in NPC biopsies
in situ, such as the elements of EA complex:
BZLF1,
BRLF1, and
BGLF5. The subgroup of NPC_49, NPC_52, NPC_66, and C666-1 had more lytic genes that were expressed compared with the remaining NPC samples (Fig. 2C). This finding further confirmed that the four samples exhibited the highest coverage of the lytic gene-enriched region of the EBV genome (Fig. 1B and 1C).
The expression of the immediate-early gene
BZLF1 (Zta) or
BRLF1 (Rta) is sufficient to reactivate EBV infection from the latent to lytic form [
26‒
28]. BZLF1 and BRLF1 are transcribed from the immediate early promoters Zp and Rp, respectively [
26]. Zta is an AP-1-like viral protein that acts at Zta-responsive elements (ZRE) to recruit RNA polymerase II and associated factors. BRLF1 directly binds as a homodimer to BRLF1-responsive elements (RREs) contained within several early lytic viral promoters. BRLF1 binds and activates several of the same early lytic EBV promoters as BZLF1. Both BZLF1 and BRLF1 are required for full expression of the early and late EBV proteins [
29]; as such, the virus spreads. The transcription of
BZLF1 and/or
BRLF1 was evident in 7 out of 12 NPC samples and in C666-1 (Fig. 2D, upper). As expected, the expression of downstream lytic genes (51 genes; Supplemental Data 2, 3) was highly correlated with the expression of
BZLF1 and/or
BRLF1 (Fig. 2D, lower). However, regardless of the expression of
BZLF1 and/or
BRLF1, some lytic genes are highly expressed, including
BALF3,
BALF4,
BALF5,
BILF1,
BNRF1,
LF1, and
LF2 (Fig. 2E), thereby indicating that the expression of these lytic genes might be activated by other factors independent of Zta and Rta.
Validation of the EBV transcripts in EBV-carrying cell lines by real-time RT-PCR
To confirm the RNA-seq results, we examined all EBV transcripts (68 genes) in the EBV-positive C666-1 B95-8, Raji, Akata(+) (carrying recombinant EBV), and LCL-1 lymphoblastoid cells, as well as the EBV-negative NPEC2, CNE2, and Akata(–) control cells. EBERs are polyA(–) transcripts; thus, we analyzed their levels by real-time RT-PCR from cDNA templates generated with random hexamer primers. The EBV in Akata(+), Raji, and B95-8 cells has been reported to be type I, I/III, and III latency, respectively (as described in “Materials and methods”). C666-1 cells have been reported to carry EBV of type II latency, whereas LCL-1 cells have been reported to carry EBV of type III latency. As shown in Fig. 3, the type II latency genes EBER1, EBER2, LMP1, LMP2, and EBNA1 were detected in C666-1 and other EBV-positive cells, with the exception that LMP2 was not detected in Raji cells. Among the latency genes expressed in C666-1 cells, EBER1 and EBNA1 were the most highly expressed, LMP1 and LMP2 were expressed at a moderate level, whereas EBER2 was expressed at low levels. The vast majority of polyadenylated viral RNA in NPC were BamHI A rightward transcripts (BARTs) that are spliced in a complex fashion; thus, these RNAs cannot be unambiguously differentiated by qPCR. In addition, EBNA2 and EBNA3s, which are highly expressed in cell lines with EBV type III latency, were detectable in C666-1 and the Akata(+) type I cell line.
Remarkably, we confirmed that the expression levels of immediate-early genes and various EBV lytic gene in C666-1 cells by real-time PCR. For example, the tegument protein BGLF2, the viral Bcl-2 homolog BHRF1, the dUTPase BLLF3, the kinase protein BGLF4, and the helicase-primase replication protein BBLF4 were expressed in all the EBV-positive cells (Fig. 4). The nucleocapsid protein BCLF1, the viral interleukin 10-encoding gene BCRF1, and the cyclin B1 homolog BDLF2 were highly expressed in the C666-1, B95-8, Akata(+), and LCL-1 cells. The alkaline exonuclease BGLF5 was expressed in all the EBV-positive cells, except the LCL-1 cells. The glycoprotein BILF2 was expressed in B95-8, Akata(+), and LCL-1 cells. None of these genes were detected in the three EBV-negative cell lines. The relative expression levels of the rest of those genes and the other lytic genes are shown in the Supplemental Figs. S1–S6. In addition, we evaluated whether EBV genes transcripts by RNA-seq RPKM data correlated with the genes qRT-PCR value in C666-1. Log-transformed RPKM and log-transformed qRT-PCR of highly abundant (r = 0.71, P = 0.002) and relatively abundant (r = 0.492, P = 0.012) EBV genes are correlated (Supplemental Fig. S7), supporting that RNA-seq provides reliable quantitative estimates of transcript levels in C666-1.
Validation of the EBV transcripts in NPC tissue
To confirm the EBV expression patterns in vivo, we detected the expression of the abovementioned genes, 10 additional NPC biopsies (T1, T4, T7, T12, T14, T5, T11, T35, T36, and T37) and 2 samples of non-cancerous nasopharyngeal epithelial tissue (N7 and N9), by real-time PCR. Consistent with the results by RNA-seq, those latency genes expressed in type II infections, which include EBER1, EBNA1, LMP1, and LMP2, were expressed at a relatively high or moderate levels in most of the tumors (Fig. 5). EBER1 was the most highly expressed latency gene transcripts in all of the NPC samples, and EBER2 was expressed in only one of the NPC samples. The EBNA2 were expressed at a low level in some of the NPC samples, whereas EBNA3s was expressed in half of the NPC samples. Although we could not exclude the potential infiltration of lymphocytes into the NPC samples as one of the sources of EBV transcripts, the uneven distribution of the EBNA2 and EBNA3s in the different NPC samples suggested that any contamination by lymphocytes is unlikely to be a significant factor. EBV transcripts were not detected in the non-cancerous nasopharyngeal epithelial tissue.
As expected, the significant expression of lytic genes was also detected in NPC samples. The tegument protein BGLF2 and the dUTPase BLLF3 were expressed in 5 of 10 NPC samples as shown in Fig. 6, whereas the helicase-primase replication protein BBLF4 and the kinase protein BGLF4 were expressed in the same NPC samples (T1, T14, T35, and T37). The cyclin B1 homolog BDLF2, the nucleocapsid protein BCLF1, the viral Bcl-2 homolog BHRF1, the alkaline exonuclease BGLF5, the glycoprotein BILF2, and the viral interleukin 10-encoding gene BCRF1 were expressed at similar levels in the NPC samples. The relative expression levels of the other lytic genes are shown in Supplemental Figs. S8‒S13. However, BFRF2, BHLF1, and BKRF3 were not detected in the NPC samples.
Clustering of NPC patient groups by EBV gene expression
According to the EBV genome coverage data, two distinct patterns of read distribution are present across the EBV genome region that is enriched by lytic genes. Furthermore, to analyze the patterns of EBV gene expression that characterize the different stages of EBV infection in NPC patients, we performed unsupervised hierarchical clustering analysis of the EBV gene expression data encompassing 12 NPC samples. As shown in Fig. 7, the cluster dendrogram presented two major subtypes of NPC samples according to the EBV expression pattern. In addition, the first subtype of NPC samples (NPC_49, NPC_52, and NPC_66) from the late TNM stages with skull base invasion and advanced metastasis of lymph node presented similar gene expression.
EBV plays a key role in the development of NPC. EBV most likely subverts some of the oncogenic pathways contributing to NPC. To examine whether the different NPC subgroups clustered on the basis of the EBV expression pattern could be separated into different cellular signaling pathways, we investigated the differences in the global gene expression profiles between the subgroups (Supplemental Fig. S14) and conducted GO and KEGG pathway enrichment analysis with DAVID (v6.7 [
23]) for the differently expressed genes. Results revealed that the first subtype of NPC samples (NPC_49, NPC_52, and NPC_66) is correlated with several signaling pathways (Supplemental Fig. S15), including those related to heterotrimeric G-protein signaling, inflammation mediated by chemokine and cytokine signaling, ribosomes, protein metabolism, influenza infection, ECM-receptor interaction, integrin cell surface interactions, integrin signaling, and focal adhesion.
Discussion
Gene expression profiling studies can provide a full view of mRNA expression in biological processes [
30]. We conducted massively parallel sequencing to detect the transcripts of EBV genes in NPC tissue and C666-1 cells that persistently carry EBV and serve as a good model of EBV research in NPC and to understand the roles of EBV genes in the pathogenesis of NPC.
On the basis of the sequencing results, we obtained 68 EBV genes from the sequencing of the cellular polyadenylated transcripts in the EBV-positive cell line C666-1. These transcripts included 10 latency genes and 58 lytic genes. Consistent with previous reports in NPC, our study demonstrated that the major latency gene transcripts with high RPKM values were correlated with type II latency infection genes, such as
EBER1,
EBNA1,
LMP1,
LMP2, and
BARTs; conversely,
EBER2 was only detected in one NPC sample [
31]. The stability of
EBER1 and
EBER2 is different; the half-life of
EBER1 is more than tenfold longer than that of
EBER2 [
32]. This phenomenon may explain the remarkably different expression of
EBER1 and
EBER2 in C666-1 cells and in most of the NPC samples.
EBER1 promotes the growth of BL cells by interacting with L22 [
33]. However,
EBER2 exhibits a stronger transforming ability on lymphoblastoid cell lines (LCL) cells than
EBER1 does [
34]. The different EBV expression patterns in BL and LCL cell lines may account for the different roles of
EBER1 and
EBER2. Nevertheless, studies have yet to determine whether
EBER1 and
EBER2 have distinct functions in NPC.
Type III latency genes, namely,
EBNA3A,
EBNA3B, and
EBNA3C, were detected in C666-1 cells. The expression of
EBNA3C is slightly higher than that of
LMP2 in C666-1 cells. This result was confirmed by the sequencing data of 12 NPC tissue samples, including the type II latency genes in 12/12 samples and the type III latency genes in 4/12 samples. The expression of type III latency genes were further confirmed by qRT-PCR in NPC tissue, with 4/10 samples detected in
EBNA3A, 6/10 samples detected in
EBNA3B, and 7/10 samples detected in
EBNA3C. Taken together, the detection of type III latency genes in C666-1 is not an occasional event. Instead, EBV expressed type III latency genes in NPC. Moreover, Cheung
et al. [
35] showed the presence of the
EBNA2 transcripts in C666-1 cells but failed to detect any protein expression. Srikumar
et al. [
36] showed the presence of
EBNA2 and
EBNA3s at very low levels in microdissected samples. A recent report showed that
EBNA3C attenuates DNA damage signaling and contributes to the transformation of cells by EBV [
37]. This finding indicated that
EBNA3C may play an important role in the tumorigenesis of NPC. Thus, the latency gene expression in NPC seems not necessary to be restricted to the type II latency pattern. However, further characterization of the latency proteins is required.
Unexpectedly, a series of lytic genes with high expression valued by RPKM were detected in C666-1. More importantly, we had similar sequencing results in 12 of NPC tissue samples. The EBV immediate-early genes BZLF1 and BRLF1 were expressed in 7 out of 12 NPC tissue samples, whereas 7 lytic genes (BILF1, BNRF1, BALF3, BALF4, BALF5, LF1, and LF2) were expressed in every specimen and not correlated with BZLF1 and BRLF1. Therefore, the expression of these 7 genes was possibly independent of the activation of the transactivators BZLF1 and BRLF1 and other factors may be involved in inducing these genes. The sequencing data also revealed that 15 EBV transcripts were expressed at levels greater than all of the cellular polyadenylated transcripts in C666-1 cells. These transcripts (Fig. 2) include the latency genes, namely, BKRF1, RPMS1, and A73; the early genes, namely, BRRF1, BHLF1, BMRF1, BKRF4, BLLF2, BMRF2, and BALF5; the late gene BALF4; and the genes of unknown function, including LF1, LF2, BILF1, and BALF3. In the 12 NPC tissue samples, the expression levels of LF2, BALF3, BALF4, and BALF5 were relatively high. These leftward genes are located in the BamHI A region with apparent coverage across the BARTs. Therefore, we performed strand-specific RNA-seq to determine the orientation of these transcripts and to confirm whether these leftward genes are highly expressed in NPC.
None of the transcripts were mapped to the EBV genome in EBV-negative NPEC cells by RNA-seq. Therefore, the presence of EBV transcripts was not caused by background RNA. We validated the gene expression in several cell lines and NPC tissue through real-time PCR. Although the possibility of DNA contamination could not be ruled out, especially in samples with very low EBV gene expression, we treated the RNA samples with DNase I; as a result, the possibility of DNA contamination was sufficiently decreased.
The RNA-seq data indicated that
BHLF can be detected in C666-1 and 3/12 of the NPC samples;
BHLF was derived from multiple promoters, including those induced by re-activation and structural components [
38,
39]. Consistent with our results, previous findings demonstrated that EBV-associated NPC expressed lytic genes, such as
BZLF1,
BHRF1,
BRLF1,
BALF2,
BCLF1,
BGLF5,
BARF1,
BALF1,
BMLF1, and
BLLF1 [
40–
42]. These genes also yield high expression levels, as shown by our sequencing data and validated by real-time PCR results. However, some of these lytic genes and their potential roles in the development of NPC should be further verified at a protein level. We proposed two possible reasons for the expression of lytic genes: first, the lytic genes were expressed in the incubation period; second, a small group of cells have entered the lytic cycle; as a consequence, these lytic genes were highly expressed. Furthermore, we aimed to investigate the possible co-localization of latency genes and lytic genes in NPC cells through immunostaining or flow cytometry.
The assessment of EBV expression profiles in gastric cancer samples through RNA-seq showed that 17% of the samples expressed EBV transcripts; most of the transcripts from the
BamHI A region comprised the majority of EBV reads.
LMP2 and
LMP1 were ranked next to this region, and this finding suggested the potential value of IDO1 as an immunity inhibitor for adjuvant therapies against high EBV-associated gastric carcinoma [
19].
Compared with previous studies on EBV in gastric cancer and BL and on EBV miRNA in NPC, the present research provides the first evidence obtained through direct sequencing to systematically and comprehensively describe the expression of EBV genes in NPC. Our results demonstrate that in addition to the expression of type II latency genes, some type III latency genes and several lytic genes are expressed at high levels in C666-1 cells and NPC tissue. In addition to BARTs that contribute to most of the EBV genes, lytic genes, such as BNLFs and LFs, provided the majority of the EBV reads in the NPC samples. The subtypes of the NPC samples (NPC_49, NPC_52, and NPC_66 in the late TNM stages and with skull base invasion and advanced metastasis of lymph nodes) are correlated with several signaling pathways, including those involved in heterotrimeric G-protein signaling, inflammation mediated by chemokine and cytokine signaling, ribosomes, protein metabolism, influenza infection, ECM-receptor interaction, integrin cell surface interactions, integrin signaling, and focal adhesion. However, the protein level expression of type III latency genes and lytic genes in NPC should be further investigated to confirm traditional concepts. Some genes with high expression or high ratios in NPC tissue may play potential roles in the progression of NPC. Therefore, the function and mechanism of these genes should be explored to develop new targets to diagnose and treat NPC.
Higher Education Press and Springer-Verlag Berlin Heidelberg