1 INTRODUCTION
Renal cell carcinoma (RCC) is a disease where malignant cells formed in the renal tubules and responsible for more than 90% of kidney cancer. There are more than 140,000 annual RCC related-deaths around the world based on the latest data from the World Health Organization. RCC also denotes the sixth and tenth most commonly diagnosed cancer in men and women, respectively [
1]. As a “silent” cancer, RCC is often diagnosed at advanced stages, with inherent resistance to several lines of treatment [
2]. It has several subtypes, which include clear cell RCC (KIRC, ~75%), papillary RCC (KIRP, 15%‒20%), and chromophobe RCC (KICH, ~5%) [
3,
4]. Different RCC subtypes bear their unique molecular profiles that contribute to the complexity of RCC pathogenesis and thus management. Therefore, it needs to be further characterized based on the subtypes in order to develop a better diagnosis and management of RCC [
5].
With our current understanding, the molecular etiology of RCC is still not fully understood. However, familial RCC, which accounts for around 3% of cases, is known to be caused by mutation in von Hippel-Lindau (
VHL) tumor suppressor gene, which is responsible for the regulation of hypoxia inducible factor-alpha (HIF-α) signaling pathway [
6]. Otherwise, most RCC cases are sporadic and associated with epigenetic alteration instead of genetic mutation [
2]. Epigenetic alterations, such as DNA methylation, histone modification, or RNA silencing, have been extensively studied in cancer, including RCC [
7]. For instance, the chromatin modifying genes,
PBRM1 and
SEDT2, are mutated in 69.3%, 53%, and 14.9% of KIRC, KIRP, and KICH, respectively [
8]. Interestingly, these mutations are associated with methylome remodeling in RCC, which are supported by numerous studies. For instance, Chen
et al. provides a link between aberrant DNA methylation and protein expression within each RCC subtype [
9]. There have also been efforts in utilizing DNA methylation for novel classification or proposing prognostic markers [
2,
10]. However, most studies did not correlate their results with the histological subtypes as the major determinant for patient management [
11]. Not to mention, none of them focused on early-stage RCC [
12,
13].
In this study, we identified novel methylation-driven gene markers by differential methylation analysis of three RCC subtypes (KIRC, KIRP, and KICH) in their early stage (stage I) based on the American Joint Committee on Cancer (AJCC) classification in Caucasian population using the TCGA database, while elucidating its clinical relevance by analyzing their prognostic and diagnostic values, as well as exploring their protein-protein interaction and pathway enrichment related to each RCC subtype. We focused on stage I RCC considering its clinical manifestations that are unclear to specific subtypes [
14]. Meanwhile, Caucasian population was chosen due to the data availability in the TCGA. So far, there have been no significant differences in overall incidence and mortality rates of RCC between populations [
15]. This study is also limited to the observation of the direct effect of DNA methylation towards gene expression, meaning that the observed genes were limited to those having an inverse correlation with their methylation pattern.
2 RESULTS
2.1 Differential methylation analysis of stage I RCC tissue samples
In this study, we retrieved clinical data, DNA methylation beta-value data, and selected mRNA expression data of 120 KIRC stage I patients, 118 KIRP stage I patients, and 27 KICH stage I patients. From these patient data, methylation beta-values of 485,577 CpG sites in 64 normal tissue samples, 120 KIRC stage I samples, 118 KIRP stage I samples, and 20 KICH stage I samples were retrieved. Using Fisher’s exact test, we obtained 74 DMRs in KIRC stage I samples, in which all of them are hypomethylated (Fig.1). Meanwhile, 520 DMRs were obtained in KIRP stage I samples, in which 378 of them are hypomethylated and 142 of them are hypermethylated (Fig.1). The complete list of the differentially methylated CpGs is available in the Supplementary Materials. Surprisingly, we did not obtain any DMRs between KICH Stage I and normal samples. We noticed some overlapping CpGs that are hypomethylated in both KIRC and KIRP, in which ATXN1 becomes the most significant. Therefore, we conducted further analysis on ATXN1 expression in both RCC subtypes.
2.2 Differential expression analysis of selected DMRs
After obtaining the DMRs, we continued with the differential expression analysis of the selected genes that correspond to the CpG sites. The mRNA expression data were retrieved from 36 normal tissue samples, 120 KIRC stage I samples, and 118 KIRP stage I samples. After sorting, we also compared the change in expression level with the methylation value. We only used the genes that have an inverse relationship with the methylation value of its corresponding CpG sites, meaning an overexpressed gene should be hypomethylated, and vice versa. From the matched genes, we selected the top 5 overexpressed and underexpressed genes to be compared between the diseased patients and normal individuals. For KIRC, from the 74 DMRs that correspond to 39 genes, 22 differentially expressed genes (DEGs) were obtained, in which only 8 genes were overexpressed, which follows our criteria. Despite there being downregulated genes in KIRC Stage I samples, we did not include such genes since they do not have a negative correlation with their CpG methylation beta-values. We compared the expression level of the top 5 upregulated genes between KIRC Stage I and normal samples (Fig.2), with VIM as the most notable gene expression changes.
Meanwhile, for KIRP, 192 genes were differentially expressed from 259 genes obtained from the 520 DMRs. Unlike in KIRC, 5 overexpressed and underexpressed genes were able to fulfill our criteria in KIRP. We compared the expression level of those 10 genes between cancer and normal tissue samples (Fig.2). TNFAIP6 has the most notable increase of expression in cancer samples, while CASR has the most notable decrease of expression. The selected genes and their corresponding CpG annotations within the genes are shown in Tab.1, by searching the Epigenome Wide Association Studies (EWAS) Atlas database using its probe ID.
2.3 Correlation tests
From those selected genes, we confirmed the association of each gene expression with their CpG methylation value using Pearson’s correlation test (Tab.1). The tests showed an inverse correlation in 4 genes for KIRC, with VIM having the strongest correlation. Meanwhile, 8 genes in KIRP showed a negative correlation, with TNFAIP6 having the strongest correlation. Interestingly, there is no significant correlation between ATXN1 methylation and gene expression in KIRP, while there is a moderately significant correlation in KIRC. This confirms a preliminary validation of DNA methylation-mediated silencing or activation of the selected genes.
2.4 Somatic mutation validation
Prior to the investigation of the biomarker capacity of the selected genes, the catalogue of somatic mutations in cancer (COSMIC) database was used to validate the mutation rate and role of the selected genes in cancer progression. The results are summarized in Tab.2, which indicates no recorded association between all the selected genes with any cancer types. The results validate the novelty of the selected genes, particularly as a potential marker in RCC. There are still recorded occurrences of somatic mutations according to the database. However, since it does not contribute to cancer development, it might not affect the expression of the genes found in our analysis, thus signifying the direct effect of methylation aberration towards gene expression changes.
2.5 Overall survival analysis
In order to select the potential candidate prognostic markers for KIRC and KIRP stage I patients, we conducted OS analysis using the Kaplan-Meier curve while also calculating the hazard ratio and their 95% confidence intervals of each gene. We found that high ATXN1, as well as high RFTN1 expressions, are associated with lower overall survival in KIRC (Fig.3, B), whereas high TM4SF19, as well as high GRAMD1B expressions, are associated with lower overall survival in KIRP (Fig.3, E). Interestingly, there is no significant association of VIM and TNFAIP6 expressions on the overall survival in KIRC and KIRP, respectively, although they showed the strongest correlation (Fig.3, F). Furthermore, one-way ANOVA tests showed that ATXN1 and VIM have the highest upregulation in KIRC compared to KIRP and KICH, while TNFAIP6 and TM4SF19 have the highest upregulation in KIRP compared to KIRC and KICH (Supplementary Fig. S1). This supports our preliminary selection of RCC-specific prognostic markers.
2.6 Receiver operating characteristic (ROC) analysis
The diagnostic values of the selected genes were also investigated using ROC analysis. We analyzed all selected genes with significant methylation-expression correlation from each subtype. The area-under-curve (AUC) cut-off value of 0.8 was used as a threshold for a potential diagnostic biomarker. In addition, we performed ROC analysis for ATXN1 in KIRC since it shows potential prognostic value. For KIRC, only RFTN1 and VIM expressions may be used to differentiate between KIRC and healthy individuals (Fig.3), as well as between other RCC subtypes (Fig.3), whereas ATXN1 has the least ability to distinguish between healthy and KIRC patients (AUC = 0.78). For KIRP, TM4SF19 and TNFAIP6 expressions may be able to differentiate between KIRP and healthy individuals (Fig.3). However, when compared to other RCC subtypes, their AUC values drop below the cutoff value, highlighting its low accuracy in distinguishing between RCC subtypes (Fig.3).
2.7 Protein-protein interaction and functional enrichment analysis
Consequently, we conducted protein-protein interaction studies of ATXN1, HLA-B, RFTN1, and VIM for KIRC as well as TM4SF19, GRAMD1B, and TNFAIP6 for KIRP by using the STRING web tool. The interactome networks were created independently in different RCC subtypes. The multiple protein analysis of ATXN1, HLA-B, RFTN1, and VIM results in a total of 34 nodes and 173 interactions (Fig.4). We found that RFTN1 nodes are not connected to the general network and considered as a singleton. Therefore, we constructed a single network analysis for RFTN1 which results in a total of 24 nodes and 49 interactions (Fig.4). Functional enrichment analysis of these significant nodule interactions were then performed. We showed that these interactions are highly associated with immune response, protein expression, and machinery, cell apoptosis, autophagy, organic acid metabolism, pathways involved in specific diseases, graft-versus-host disease, and influenza (Fig.5).
Meanwhile, for KIRP, the multiple protein analysis of TM4SF19, GRAMD1B, and TNFAIP6 results in a total of 33 nodes and 465 interactions (Fig.4), with all genes forming singletons. Therefore, we performed single protein network analysis for each other singletons. TM4SF19 interaction network results in a total of 27 nodes and 55 interactions (Fig.4), while GRAMD1B interaction network results in a total of 20 nodes and 32 interactions (Fig.4). Functional enrichment analysis of the significant interactions revealed that immune signaling, inflammation, metabolism-related pathways, and metabolite transport are enriched (Fig.5). In general, immune-associated and metabolic pathways are enriched in both KIRC and KIRP. The term names for each GO process or KEGG pathway are provided in Supplementary Tables S1 and S2.
3 DISCUSSION
3.1 DNA methylation role at early stage of different RCC subtype
Genome-wide DNA methylation profiling using Illumina Human Methylation 450k data from the TCGA database revealed distinct DMRs in stage I KIRC and KIRP, but not KICH. This is consistent with a previous study reporting no hypermethylated CpGs in KICH, suggesting the minimal influence of methylation towards its progression [
9,
16]. Interestingly, all DMRs in stage I KIRC are hypomethylated, suggesting that early-stage KIRC may favor gene activation rather than silencing. This is in accordance with a study reporting that KIRC with larger tumor sizes and higher grades are associated with hypermethylation [
17]. Differential expression analysis and Pearson’s correlation tests showed some genes with positive correlation between their methylation and expression, which is contrary to the accepted association. These contradictions are not uncommon, since other factors may also contribute to gene expression changes [
18]. Therefore, we only selected genes with negative correlation.
CpG site annotations were obtained from the EWAS Atlas database, which features extensive knowledge regarding the association of epigenomics and biological features [
19]. Interestingly, we found that most CpG sites are intragenic, since the majority of CpG islands are located at the transcription starting site (TSS) [
20]. Unlike TSS CpG sites that mainly regulate transcription of the corresponding exons, intragenic sites can promote gene transcription of introns and non-corresponding exons, as well as regulate histone methylation, allowing a more delicate regulation [
21]. The latter is more probable since intragenic CpG islands between tissues are more diverse than TSS sites [
20]. Not to mention, an important validation is that based on COSMIC v91 that uses human genome version 38 (GRCh38), none of the genes have reported cancer-associated mutations, thus supporting our findings in which their expression changes are indeed methylation-driven [
22].
OS and ROC analysis showed some genes with both prognostic and diagnostic values, whereas some only serve as either a prognostic or diagnostic marker. This illustrates the ambiguity of the word ‘biomarker’ itself, since not all diagnostic biomarkers can serve as a prognostic biomarker and vice versa [
23]. In addition, none of the selected genes have been reported in recent reviews regarding methylation markers in RCC [
12,
13]. Regardless, subsequent confirmation of these findings is required, such as gene expression and methylation studies using actual RCC cell lines, animal models, or patient specimens.
3.2 Immune response modulation as potential phenotype of early-stage KIRC
Protein-protein interaction studies and functional enrichment analysis showed a strong association with immune response, metabolic pathways, as well as antigen processing in KIRC. These findings are supported by the known hallmarks of KIRC, notably increased immune signature, glycolysis rate, and cytoplasmic lipid droplets [
24,
25]. ATXN1, HLA-B, and VIM create a single interaction network, while RTXN1 forms a singleton.
ATXN1 (ataxin-1) has a main role in spinocerebellar ataxia type 1 (SCA1), a neurological disorder [
26]. Meanwhile,
VIM encodes for a mesenchymal marker, vimentin, an established immunohistochemical marker of KIRC [
27]. The low and indirect interaction between ataxin-1 and vimentin may be attributed to co-expression, thus might explain why
VIM expression is not associated with OS in KIRC although being highly overexpressed. Their interaction has been confirmed by a former co-immunoprecipitation study [
28]. We also highlight the interaction of ataxin-1 and caspase-7 (encoded by
CASP7), one of the executors of apoptosis, along with other functions, such as cell differentiation [
29]. Its activation in renal epithelium is known to be p53-dependent, inferring that its loss-of-function in KIRC is likely to be due to p53 depletion and thus associated with worse prognosis [
30‒
32]. In rat pheochromocytoma,
ATXN1 has also been reportedly upregulated in hypoxic conditions, which is prominent in KIRC [
33]. Paradoxically, ataxin-1 is shown to inhibit epithelial-mesenchymal transition (EMT), while in another study, it promotes cervical cancer proliferation [
34,
35].
RFTN1 (raftlin) is one of the key constituents of lipid rafts, essential for protein trafficking required for cancer progression [
36]. It is also involved in the innate immune response and angiogenesis regulation [
37,
38]. Since lipid rafts play a crucial role in glucose transporter stability, raftlin may be involved in this prominent phenotype of KIRC [
39,
40]. Moreover, raftlin interacts significantly with CDC42BPG, ATG2A, and ATG2B.
CDC42BPG encodes for CDC42-binding kinase, which is overexpressed in RCC [
41‒
43]. However, based on the Human Protein Atlas, its overexpression seems to be associated with better prognosis in RCC [
44,
45]. Both ATG2A and ATG2B are related to autophagy, in which its paradoxical role in cancer is often pronounced [
46]. Comprehensive expression analysis records that they are minimally expressed in normal adult tissues, including the kidney [
47].
3.3 Variation of phenotypes in early-stage KIRP
Despite the three genes (
TNFAIP6, TM4SF19,
GRAMD1B) being differentially expressed in KIRP, they did not have any pathway association, thus interaction networks were constructed individually. The neutrophil degranulation pathway is generally enhanced in KIRP, which is consistent with the previous findings [
48,
49]. This highlights the similar pathway enrichment in both KIRC and KIRP, which is the immune signature.
TNFAIP6 (tumor necrosis factor alpha-induced protein 6) is a secreted protein that responds towards TNF-α signaling and is associated with cell migration. Interestingly, we found that TNFAIP6 interacts significantly with MMP-9, which is closely related to neutrophil degranulation and increased cell invasion. Importantly, overexpression of TNFAIP6 has been associated with worse prognosis in urothelial, colorectal, gastric, and non-small cell lung cancers [
50‒
54]. A study also highlights that downregulation of TNFAIP6 drives EMT in normal renal epithelium, suggesting its possible downregulation in late-stage KIRP[
55,
56].
TM4SF19 encodes for a transmembrane protein related to immune evasion due to its overexpression by cancer-associated fibroblasts [
57]. Although its role in KIRP has not been elucidated, its other family members have been recently pronounced as potential cancer targets [
58]. It has also been associated with resistance to afatinib in non-small cell lung cancer [
59]. TM4SF19 is found to interact significantly with LEPROTL1, SMEK1, and HTRA4. Serum leptin and adiponectin levels, which are partially regulated by LEPROTL1 (leptin receptor overlapping transcript-like 1), have been positively associated with the risk of RCC [
60,
61]. SMEK1 (suppressor of MEK1) is involved in various cellular processes, such as DNA repair, apoptosis, microtubule organization, and cell cycle arrest [
62]. HTRA4 (high temperature requirement protease A 4), a specific marker for preeclampsia, is reportedly overexpressed in glioblastoma multiforme and breast cancer, as well as associated with the regulation of junctional proteins and microtubules [
63‒
65]. Based on the current understanding of TM4SF19 and its interactions, TM4SF19 may have broad functions in KIRP progression.
GRAMD1B (GRAM protein domain 1b) is involved in lipid and protein signaling, along with many unknown functions. Multiple studies suggest a multifaceted role of GRAMD1B in the JAK/STAT signaling pathway, with negative and positive feedbacks in breast and gastric cancers, respectively [
66,
67]. Its overexpression is also associated with chemoresistance in ovarian cancer and
TP53 mutation in non-small cell lung cancer [
68,
69]. We showed that in KIRP, high expression of
GRAMD1B correlates with lower OS although being downregulated. Such paradoxical results suggest that GRAMD1B may have unique and dynamic roles in KIRP.
Despite all the results obtained, this study can be subjected to further validations. The proposed conformational wet-lab study is shown in Fig.6. Moreover, further adjustment towards confounding variables may be needed, such as patient age and comorbidity [
70]. However, so far, there is no significant correlation between age and gene expression across RCC subtypes [
71]. Not to mention, the patient age obtained in this study is also normally distributed (Supplementary Fig. S2). A recent study by Wong
et al. suggested that RCC patients with renal stones, chronic kidney diseases, and cerebrovascular disease, lead to a higher risk of mortality, followed by significant changes in immune metagenes [
72]. Unfortunately, comorbidities cannot be adjusted since due to lack of data for specific subtype. We have also minimized the effect of confounding variables by limiting the data scope to ‘Stage I’ and ‘Caucasian population’.
4 CONCLUSIONS
Overall, this study unravels potential DNA methylation-based subtype-specific biomarkers in KIRC and KIRP. In stage I KIRC, VIM, ATXN1, and RFTN1 gene expression are inversely correlated with their methylation values and have prognostic and/or diagnostic values. These genes are associated with immune signature and metabolic pathways which are known to be altered in KIRC. Meanwhile, GRAMD1B, TNFAIP6 and TM4SF19 gene expression are inversely correlated with their methylation values and have prognostic and/or diagnostic values in stage I KIRP. These genes are responsible for neutrophil degranulation, which correlates with tumor progression and invasiveness, and have also been reported in KIRP. Surprisingly, no significant methylation alterations were identified in KICH. Nevertheless, wet-lab experiments are still needed to further confirm our findings, as a step ahead towards a better molecular characterization and early diagnosis of RCC.
5 MATERIALS AND METHODS
The pipeline of our study is described in Fig.7. The methods consist of data retrieval, differential methylation analysis, differential gene expression analysis, somatic mutation validation, correlation tests, OS analysis, ROC analysis, protein-protein interaction studies, and functional enrichment analysis.
5.1 Data retrieval
Clinical data, methylation beta-value data, and mRNA expression data (normalized counts) of KIRC, KIRP, and KICH Stage I patients were retrieved from The Cancer Genome Atlas (TCGA) database by using the TCGA Assembler [
73] and TCGAbiolinks packages [
74] from April to May 2020. Only patients with complete data (clinical data, methylation beta-value data, and gene expression data) were selected. The patients were filtered based on AJCC pathologic stage, ethnicity, and type of array used. Illumina Human Methylation Array 450k probes were chosen for the methylation beta-value data, whereas normalized counts were chosen for the mRNA expression data.
5.2 Differential methylation analysis
Differential methylation analysis was conducted in R 3.6.3 software by using the ‘edgeR’ package [
75]. Normal tissue methylation data from KIRC and KIRP Stage I patients were merged and compared with data from each RCC subtype, since no normal tissue samples were available for KICH. The beta-values were analyzed using Fisher’s exact test to determine the differentially methylated CpGs between the normal and RCC samples. Log
2 fold change of less than ‒1 or more than 1 and false discovery rate (FDR) of less than 0.01 were used as the cut-off points. Volcano plots of the differentially methylated regions (DMRs) of the CpG sites were constructed in R 3.6.3 software.
5.3 Differential gene expression analysis
Differential gene expression analysis was conducted in R 3.6.3 software by using the ‘edgeR’ package [
75]. Normal tissue expression data from KIRC and KIRP stage I patients were merged and compared its expression with data from each RCC subtype. Only selected genes that correspond to the DMRs were subjected to this analysis, by using Fisher’s exact test. DEG with the FDR < 0.01, and sorted the list based on the log fold change value (from the highest to lowest) and
p-value (from the lowest to highest). Heatmaps of the differentially expressed genes were constructed in R 3.6.3 software. The selected CpG-gene pairs were screened for the CpG site annotation using the epigenome wide association studies (EWAS) Atlas by using the CpG probe ID [
19].
5.4 Correlational tests
Pearson’s correlation tests were conducted between methylation beta-values and relative gene expression (in log10) of selected genes after the differential expression analysis in R 3.6.3 software. The CpG-gene data pairs were determined by matching the patient TCGA ID. Inverse correlations with p < 0.05 were considered as statistically significant.
5.5 Somatic mutation validation
The selected genes were screened for any recorded somatic mutations using the Catalogue of Somatic Mutations in Cancer (COSMIC) database by using the gene symbol [
76]. The percentage of reported somatic mutations and its association with cancer progression were retrieved.
5.6 Overall survival (OS) analysis
In order to evaluate the prognostic values of the selected genes, OS analysis using Kaplan-Meier curves was conducted in R 3.6.3 software. Patients’ clinical data (vital status, time of diagnosis, time of last follow-up or time of death) and mRNA expression data were matched using the patient TCGA ID. The cut-off level for low and high expressions, as well as the 95% hazard ratio (HR) confidence intervals (CI) were determined by using the ‘survival’ and ‘survminer’ packages, as demonstrated by previous study [
77] (Supplementary Tables S3 and S4). 95% CI of HR of < 1 and
p < 0.05 were considered as statistically significant.
5.7 Receiver operating characteristic (ROC) analysis
ROC analysis was performed in two stages and was conducted in R 3.6.3 software by using the ‘pROC’ package [
78]. Primary ROC analysis was conducted to select genes with diagnostic value in discriminating RCC and normal kidney. The mRNA expression values of each gene and the status (cancer or normal) were plotted into the ROC curve to determine its specificity and sensitivity values. AUC value of 0.8 were used as a threshold for acceptable diagnostic value [
79]. The chosen genes were then subjected to the secondary ROC analysis to validate their diagnostic value as subtype-specific marker. The mRNA expression values of each gene and the corresponding RCC subtype (
e.g., KIRC or non-KIRC) were plotted into the ROC curve to determine its specificity and sensitivity values. Genes with AUC value of more than 0.8 were selected as potential subtype-specific diagnostic biomarkers.
5.8 Protein-protein interaction and functional enrichment analysis
The selected gene markers were subjected for protein-protein interaction analysis by using the Search Tool For The Retrieval Interacting Genes (STRING) v11.0 [
80]. Multiple protein analyses were performed first to identify possible interaction between selected genes. If there are disconnected proteins in the network, the genes were subjected to single protein analysis. The network edges are set to show the confidence interval by the thickness of the line. The confidence ranges are low (< 0.4), medium (0.4−0.7), and high confidence (> 0.7). The maximum number of interactors are limited to no more than 30 interactors in the first shell. The interaction map was visualized by using Cytoscape 3.8.0. The most significant nodule interaction is then extracted by using the MCODE application on Cytoscape 3.8.0. The selection is based on the degree cutoff of 10, node score cutoff of 0.2, k-core of 2, and maximum depth of 100. It was followed by functional enrichment analysis for gene ontology (GO) process and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The results of the functional enrichment analysis were filtered for redundant terms with 0.5 redundant cutoff. The interaction network images and functional enrichment table were exported as the result of protein-protein interaction analysis, as previously described [
81].
5.9 Statistical analysis
Other statistical analyses were conducted in R 3.6.3 software, which include normal Q-Q plots for the RCC patient age distribution and one-way ANOVA of the selected gene expressions among different RCC subtypes and normal tissue samples, followed by Tukey post-hoc test. p < 0.05 was considered as statistically significant.
The Author(s) 2021. Published by Higher Education Press.