Introduction
Cancer/testis antigens (CTAs) are frequently described for their restricted expression in gametogenic tissues as well as their expression in multiple cancers[
1]. The functions of CTA genes known to date are usually embodied at physiological (germ line) and pathophysiological (tumors) levels[
2]. CTA genes can regulate tumorigenic signaling transmission, contribute to tumor invasion and metastasis, and modulate sperm metabolism and movement[
3]. For example, CT83 has been shown to promote proliferation, migration, invasion, epithelial-mesenchymal transition, and PD-L1 upregulation in cervical adenocarcinoma[
4]. In gastric cancer, MAGE-A10, and MAGEA11 were both reported to enhance malignant phenotypes, including proliferation, migration, and invasion, with MAGEA11 acting through activation of E2F1 transcriptional activity[
5,
6]. In addition, MAGEA1 was shown to facilitate cervical cancer progression and
in vivo tumor growth[
7]. However, a comprehensive landscape of CTA gene expression levels, mutational status, and prognostic value across broad tumor types has not been precisely clarified.
Currently, immunotherapy applying immune checkpoint blockade, which can eliminate the inhibitory status of immune cells and engage the immune system to combat malignant cells, is reforming cancer therapy. Unfortunately, the effectiveness of immune checkpoint inhibitors (ICIs) in cancers varies greatly across individuals. Emerging data have indicated that approximately 9%–23% of patients exhibit an incredible amount of tumor growth when receiving anti-PD-1/PD-L1 therapy[
8]. Based on this finding, many studies have focused on establishing appropriate biomarkers, such as the immunogenicity score (TIGS)[
9], tumor mutational burden (TMB)[
10], PD-L1 expression[
11], infiltration proportions of cytotoxic T cells[
12] and tumor aneuploidy[
13], to estimate the effectiveness of ICI therapy.
The decisive factor for the effectiveness of immunotherapy is the immunogenicity of cancer cells. Based on the immune-privileged characteristics of the testis, CTAs display high immunogenicity and have been specified as promising immunotherapy targets. Several other highly immunogenic CTAs, such as MAGE-A4, PRAME, and KK-LC-1, have recently emerged as promising immunotherapeutic targets in multiple malignancies. MAGE-A4-positive tumors have been identified across diverse solid tumor types and have shown clinical responsiveness to MAGE-A4-directed TCR-T therapy[
14,
15], while PRAME-directed TCR-T therapy has also demonstrated early clinical activity in PRAME-positive advanced solid tumors[
16]. In addition, KK-LC-1 has been implicated as a potential immunotherapy-related biomarker in primary liver cancer[
17]. Nevertheless, the inadequacy of the previous studies is the unclear understanding of whether CTAs can be utilized to assess immune activity and predict ICI response. We focused on CTAs because they show restricted expression in normal somatic tissues but aberrant activation in tumors, are closely related to tumor immunogenicity, and are frequently regulated by genetic and epigenetic alterations. Compared with a broad set of non-specific cancer-related genes, CTAs therefore provide a more biologically coherent framework for pan-cancer analysis, immunotherapy prediction, and traditional Chinese medicine-related compound exploration.
Recent integrative analyses have also combined public transcriptomic cohorts, molecular subtyping, immune infiltration assessment and experimental validation to identify cancer-related biomarkers and therapeutic mechanisms, supporting the utility of multidimensional frameworks in tumor biology[
18]. In our study, we systematically explored the molecular properties of all CTA genes across 33 cancer types. To investigate the potential biological role of key CTA genes in cancer, we adopted methylation, clinical relevance, cancer pathway, drug sensitivity and consensus clustering analyses for further characterization. In addition, we incorporated a target-driven reverse network pharmacology analysis to infer traditional Chinese medicine-related compounds and herbs linked to key CTA genes, thus extending the translational relevance of CTA-centered therapeutic exploration. We also evaluated the performance of CTAs in predicting immune activities and ICI response. The gene set variation analysis (GSVA) method was applied to calculate the CTA score of each tumor sample, and a logistic regression model and receiver operating characteristic (ROC) analysis were used to assess the prediction accuracy. Our study provides a comprehensive investigation into the molecular characteristics and prognostic value of CTA genes. Most importantly, the CTA score serves as a novel and effective indicator for forecasting the immunotherapy response.
Materials and methods
Collection of CTA genes
We included 188 CTA genes into our research from previously published papers[
19]. GeneCards was used to transform these CTA gene symbols to Entrez Gene IDs. Information on these 188 CTA genes is shown in Supplementary Table 1.
Genome-wide multi-omics data across 33 cancer types across 33 cancer types
We comprehensively integrated 33 different TCGA projects into our research, which are namely Adrenocortical carcinoma(ACC); Bladder urothelial carcinoma (BLCA); Breast cancer (BRCA); Cervical squamous cell carcinoma and Endocervical adenocarcinoma (CESC); Cholangiocarcinoma (CHOL); Colon adenocarcinoma (COAD); Lymphoid neoplasm diffuse large b-cell lymphoma (DLBC); Esophageal carcinoma (ESCA); Glioblastoma multiforme (GBM); Head and Neck squamous cell carcinoma (HNSC); Kidney Chromophobe (KICH); Kidney renal clear cell carcinoma (KIRC); Kidney renal papillary cell carcinoma (KIRP); Acute Myeloid Leukemia (LAML); Brain Lower Grade Glioma (LGG); Liver hepatocellular carcinoma (LIHC); Lung adenocarcinoma (LUAD); Lung squamous cell carcinoma (LUSC); Mesothelioma (MESO); Ovarian serous cystadenocarcinoma (OV); Pancreatic adenocarcinoma (PAAD); Pheochromocytoma and Paraganglioma (PCPG); Prostate adenocarcinoma (PRAD); Rectum adenocarcinoma (READ); Sarcoma (SARC); Skin Cutaneous Melanoma (SKCM); Stomach adenocarcinoma (STAD); Testicular Germ Cell Tumors (TGCT); Thyroid carcinoma (THCA); Thymoma (THYM); Uterine Corpus Endometrial Carcinoma (UCEC); Uterine Carcinosarcoma (UCS) and Uveal Melanoma (UVM). The normalized RNA expression profile data were downloaded from TCGA database. Corresponding clinicopathological informations including stage, survival status, overall survival time (OS), disease specific survival time (DSS), disease free interval time (DFI) and progression free interval time (PFI) were acquired from UCSC Xena.
CTA score calculation
The CTA score of each tumor sample across 33 cancer types was quantified using the GSVA method via the R package “GSVA”[
20]. The gene lists for computing CTA score are provided in Supplementary Table 1.
Gene Set Enrichment Analysis
We divided samples of each cancer type into two groups based on the median value of CTA scores. R package “clusterProfiler”[
21] was used to conduct this analysis individually. Annotated gene sets c2.cp.kegg. v 7.0.symbols.gmt in Molecular Signatures Database (MSigDB) was chosen as the reference gene sets. The nominal P-value estimates the statistical significance of the enrichment score and nominal
P ≤ 0.05 was set as the cut-off criterion.
Consensus clustering, survival analysis and external dataset validation
Based on the expression profiles of 188 CTA genes, three TCGA cohorts (KIRC, COAD, and GBM) were clustered into different groups using the “ConsensusClusterPlus” package in R. The overall survival (OS) differences among different subgroups were compared by Kaplan-Meier analysis and the log-rank test. The external clear cell renal carcinoma dataset (GSE29609) including 39 patients, glioblastoma dataset (GSE83300) including 50 patients, and colon adenocarcinoma ICGC US dataset including 59 patients with corresponding available survival information were obtained from the GEO database and UCSC Xena platform.
Differential expression and mutation analysis
We only assessed cancer types that included corresponding normal samples in our research. The Limma package in R[
22] was used to identify the differentially expressed CTA genes in each cancer type. The genes with adjusted
P < 0.05 are presented in our results. TCGA mutation data of CTA genes were collected from the TCGA data portal. TCGA MAF files were analysed and summarized by R package “maftools”[
23]. Mutation information for netrins was downloaded from Cbioportal.
Clinical relevance analyses and Pancan-eQTL analysis
To investigate whether the expression of key CTA genes was correlated with the clinicopathological features of patients, we utilized the MEXPRESS web tool for statistical analysis[
24]. Expression quantitative trait loci (eQTLs) are regions of the genome containing DNA sequence variants that affect the expression level of one or more genes. In our study, we obtained cis-eQTLs (SNPs affecting local gene expression) for CTA genes across 33 cancers of TCGA from PancanQTL[
25] (Supplementary Table 3). Detailed information about the number of examined patient, the data type and the kind of normalization used in these tools was showed in Supplementary Table 4.
Methylation analysis and CNV frequency investigation
We first conducted differential methylation analysis between TCGA cancer tissues and paired normal tissues for nine key CTA genes. Then, we explored the correlation between methylation and expression and the overall survival difference between the hypermethylation and hypomethylation of key CTA genes. GSCALite was adopted to complete and visualize the analysis above. Furthermore, the correlation between the methylation of nine key CTA genes across 31 cancer types and clinical features was analyzed using LinkedOmics. The copy number variation data of key CTA genes in TCGA database and CCLE database were obtained from Cbioportal. R package “reshape2”, “RcolorBrewer” and “circlize” were utilized to help us visualize these results.
Cell lines and 5-Aza-2-deoxycytidine treatment
Five hepatocellular carcinoma (HCC) cell lines (SMMC-7721, Huh7, HepG2, Hep3B, LM3) and one normal liver cell line (LO2) were all previously established from primary HCC and were cultured in DMEM (Invitrogen, Carlsbad, CA, USA) medium with 10% fetal bovine serum at 37°C in an atmosphere of 5% CO2. To analyze methylation status of key CTA genes, HCC cells were split to a low density (30% confluence) 12 h before treatment. Then cells were treated with 5-Aza-2’-deoxycytidine (DAC, Sigma, St. Louis, MO, USA) at a concentration of 2 μmol/L and 5 μmol/L in the growth medium. After exchanging the growth medium every 24 h for a total 96 h treatment, total RNA was extracted as described below.
RNA preparation and quantitative real-time polymerase chain reaction (RT-qPCR)
Total RNA was isolated using a TRIzol reagent (Invitrogen, Carlsbad, CA, United States). RNA purity and concentration were detected by a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, United States). RT-PCR was performed using a FastStart Universal SYBR Green Master (Servicebio, Wuhan, China) Kit. Actin was used as an internal control. Primers were synthesized by Servicebio (Wuhan, China). Primer sequences were: Actin: Forward 5′- CACCCAGCACAATGAAGATCAAGAT-3′, Reverse 5′-CCAGTTTTTAAATCCTGAGTCAAGC-3′; ODF2: Forward 5′-AACATCGACCTCACAGCCATCA-3′, Reverse 5′-TTGGCAATCACCTGTAGGAACT-3′; Relative expression was calculated using the 2−△△Ct method.
DNA isolation, methylation-specific PCR and bisulfite sequencing
The genomic DNA from LO2 cell lines and other HCC cell lines were extracted using the TIANamp Genomic DNA Kit (TIANGEN, Beijing, China). 2 μg of genomic DNA was then bisulfite-converted with the EZ DNA Methylation-Gold Kit (Zymo Research, Orange, CA, USA) according to manufacturer’s protocol. The CpG islands were predicted utilizing the MethPrimer database. Methylation-specific PCR (MSP) primers were designed according to genomic sequences inside the CpG islands region. MSP primers for ODF2 were as follows: 5′-ATTTTGGAAGGTCGAGGCG-3′(MF) and 5′-CCGAATTCACGCCATTCTCC-3′ (MR); 5′-ATTTTGGAAGGTTGAGGTGGGT-3′ (UF) and 5′-TCCCAAATTCACACCATTCTCC-3′(UR). PCR cycle conditions were as follows: 95 °C 5 min, 1 cycle; 95 °C 30 s, 55 °C 30 s, and 72 °C 30 s, 40 cycles; 72 °C 5 min, 1 cycle and 16 °C 2 min, 1 cycle. MSP products were analyzed on 2% agarose gel with ethidium bromide staining and were confirmed to have methylation if a clear band was visualized in the methylation reaction. Positive control was prepared using genomic DNA treated with CpG methyltransferase (Sss I) (New England Biolabs, Ipswich, MA, USA) following the manufacturer’s directions and negative control was prepared using normal lymphocyte DNA.
Bisulfite sequencing (BSSQ) primers encompassed a 339-bp region spanning across the ODF2 transcription start site (−139 to +200 bp) and included part of the region analyzed by MSP. BSSQ primers were designed as follows: Forward 5′-GAAATATTTGTGTGATTTTGGAGTT-3′, Reverse 5′-TAATCTCCATCTCCTAACCTCATAA-3′.
Pathway analysis, drug sensitivity prediction, and target prediction of traditional Chinese medicine components
To explore the influences of CTA genes on familiar cancer pathways, we conducted pathway activity analysis using GSCALite. PharmacoDB[
26] was used to search CTA gene-related drugs, and adjusted
P < 0.05 and correlation > 0.1 or < −0.1 were set as the cut-off criteria (Supplementary Table 5). In addition, a target-driven reverse network pharmacology strategy was applied to predict traditional Chinese medicine-related compounds and herbs for the nine CTA genes. Briefly, the nine CTA genes were input into the TCMSP database[
27] to retrieve candidate compounds targeting these genes, after which the corresponding herbs containing these compounds were further identified. The 20 compounds and 238 herbs were selected based on documented target–compound–herb associations in TCMSP. Compounds were retained when they were linked to at least one of the nine key CTA genes, and herbs were further included when they contained the corresponding candidate compounds. Therefore, this network was constructed using a target-driven reverse screening strategy rather than empirical herb selection. The resulting target–compound–herb relationships were integrated and visualized using Cytoscape[
28], and target symbols were standardized using UniProt[
29].
Immune signature infiltration level quantification and logistic regression model construction
To investigate the correlations between CTA score and immune signatures, we conducted single-sample gene set enrichment analysis (ssGSEA)[
30] to calculate the enrichment level of 29 immune signatures (respectively represented by a set of marker genes) in each sample[
31]. The gene lists for the quantification of immune signatures are provided in Supplementary Table 2. Based on RNA-Seq gene expression profiles, the ESTIMATE algorithm[
32] was utilized to calculate the immune score, stromal score, estimate score and tumor purity for each tumor sample. To assess the accuracy of different molecular characteristics in predicting immune signature levels (immune score and cytolytic activity), we adopted a logistic regression model with four predictors (TMB, CTA score, tumor neoantigen load and tumor purity) in 10357 TCGA cancer samples. Using TCGA somatic mutation data of 33 cancer types, we calculated TMB as the total number of somatic, coding, base substitution, and indel mutations per megabase of the genome. The tumor neoantigen load data of each TCGA sample were obtained from public literature[
33]. These four predictors were continuous variables. Finally, the logistic regression model was constructed by the R package “rms” and tumors with high (higher than the median value) versus low (lower than the median value) immune signature scores were predicted.
Immunotherapy clinical research collection and analysis
To estimate the contributions of the CTA score in forecasting the clinical response to ICIs, we searched and included two datasets by utilizing TIDE (Tumor Immune Dysfunction and Exclusion), which is a web tool for predicting the clinical response to immune checkpoint blockade based on the gene expression profiles of cancer samples[
34], we identified two ideal datasets which conformed to our criteria. The transcriptomic data of Riaz et al., Cell 2017, were obtained from the GEO dataset GSE91061. This study explored genomic alterations in 68 melanoma patients pre- and post-nivolumab (Nivo; anti-PD-1 agent) therapy. The tumor response of patients was assessed by RECIST criteria[
35]. The Snyder et al. (2017) dataset was obtained from github.com/hammerlab/multi-omic-urothelial-anti-pdl1. This study investigated the characteristics of tumors and the immune system in 29 patients with advanced bladder cancer treated with an anti-PD-L1 drug called atezolizumab. The durable clinical benefit was set as a progression-free survival time longer than six months[
36]. The transcriptomic data of three datasets were utilized to calculate the CTA scores and ESTIMATE scores (immune scores and tumor purity) and extract the expression information of PD-1 and PD-L1. Other clinical information, including response outcome, survival time and survival status, as well as features such as TMB, neoantigen load and cytolytic score, were downloaded from TIDE and GitHub. Receiver operating characteristic (ROC) analysis was used to estimate the predictive power of the CTA score and other immune signatures. The area under the curve (AUC) was calculated for each analysis. To explore the prognostic value of the CTA score, the “survminer” R package was used to identify the best cutoff value. Kaplan-Meier analysis with a 2-sided log-rank test was used to assess differences in the overall survival time between the two groups differentiated based on the cutoff value.
Statistical analysis
Univariate cox analysis was conducted by R package “survival”.
P values < 0.05 were recognized as statistically significant. Correlation analysis was performed using Spearman’s rank correlation method by the R package “ggstatsplot”, unless otherwise specified. Kruskal-Wallis test and Wilcoxon test were utilized to compare the CTA scores among two or three subgroups of different tumors respectively. We conducted all heat map analysis and visualized using R package “ComplexHeatmap”[
37]. All statistical analyses were performed using R v3.6.1.
Results
CTA score calculation in 33 TCGA cancer types
The RNA-seq data of each cancer type were used in GSVA analysis, and the expression level of 188 CTA genes represented the activated frequencies in specific tumor samples. The GSVA algorithm was used to calculate the comprehensive expression enrichment of the CTA genes[
20]. The results of GSVA were determined as the CTA scores. Here, we incorporated > 10,000 samples representing 33 cancer types into our analysis to determine the distribution patterns of the CTA genes. The boxplot and violin plot indicate a relatively low enrichment level of CTAs in KIRP and a relatively high level in PRAD. The CTA scores fluctuated markedly in different cancer types, which indicates the aberrant expression pattern of these CTA genes among different types of cancers (Figure 1A). Univariate Cox regression analyses were conducted to detect whether the CTA scores were correlated with prognosis. The results revealed that the CTA scores were significantly associated with OS in CHOL, COAD, HNSC, KICH, KIRC, KIRP, LGG, LIHC, MESO, SARC, and UCEC (Figure 1B). Tumor mutational burden (TMB) has recently been validated to influence the cancer immunotherapeutic response. Tumor neoantigens are newly expressed antigens on tumor cells generated by viral or cellular proteins. Studies have revealed that more neoantigens may be derived from tumors with higher mutation burdens[
38]. Our correlation analysis indicated that TMB was associated with CTA scores in most tumor types (Figure 1C). Among these significantly correlated tumor types, similar relationships between neoantigens and CTA scores were observed in BLCA, BRCA, CESC, HNSC, LUAD, OV, PRAD, STAD, and UCEC (Figure 1D). The enrichment level of immune signatures was evaluated by the ssGSEA method, and the correlations between immune signatures and CTA scores were assessed subsequently. The chemokine receptor (CCR) signature and major histocompatibility complex (MHC) class I signature showed a relatively weak correlation with CTA scores in all patients (Supplementary Figure S1A, C). Significant associations also appeared at the level of cancer types (Supplementary Figure S1B, D).
To explore the profound value of the CTA score in prognostic prediction, we investigated the difference among different subgroups divided by the median value of the CTA score for OS, DSS, DFI, and PFI times (Supplementary Figure S2). Our results indicated that the OS, DSS, DFI, and PFI times were significantly shorter in the high-CTA score patients than in the low-CTA score patients of KIRC and KIRP.
Pathway associations determined by gene set enrichment analysis
We divided the samples of each cancer type into two groups based on the enrichment levels of the CTA score. R package “clusterProfiler”[
21] was utilized to conduct GSEA analysis. As shown in Figure 2, samples in CHOL and SARC with high CTA scores were enriched in multiple immune-related pathways, including the RIG-I-like receptor signaling pathway, antigen processing, and presentation, the cytosolic DNA sensing pathway and the toll-like receptor signaling pathway. Samples in LUSC, KICH, PAAD, and MESO with low CTA scores were also enriched in such immune-related pathways. Pattern recognition receptors (PRRs), such as Toll-like receptors (TLRs), NOD-like receptors (NLRs), and RIG-I-like receptors (RLRs), play imperative roles in regulating innate immune responses[
39]. These results indicated the distinct roles of CTAs in regulating the efficiency of antigen processing and presentation as well as the activation of the innate immune response. Notably, we observed that the regulation of the autophagy pathway was enriched in 9/33 (27%) TCGA cancer types with different CTA scores, suggesting that CTA genes may be involved in modulating autophagy processes.
Expression patterns and univariate Cox analysis of the CTA genes across different TCGA cancer types
To obtain the expression patterns of the 188 CTA genes, 24 cancer types with paired normal controls were incorporated into our study from TCGA database. The expression of CTA genes was altered significantly across 24 cancer types. Generally, the top 10 genes that had the most dysregulated expression proportions across 24 cancer types were identified, including CEP55 (22/24, 91%), OIP5 (21/24, 87%), PBK (21/24, 87%), NUF2 (20/24, 83%), TTK (20/24, 83%), ATAD2 (19/24, 79%), KIF20B (19/24, 79%), KIF2C (19/24, 79%), ODF2 (19/24, 79%), and NOL4 (19/24, 79%) (Supplementary Figure S3).
To further investigate the clinical relevance and prognostic value of the CTA genes, the associations between CTA genes and patient survival across the 33 cancer types were assessed by univariate Cox analysis. The results showed that all CTA genes were correlated with the overall survival of patients with at least one cancer type (Figure 3). Surprisingly, 9 of 10 genes that had the most dysregulated expression proportions across 24 cancer types also displayed a significant association with patient survival across most cancer types. These nine key CTA genes, including PBK (17/33, 51%), NUF2 (16/33, 48%), KIF2C (15/33, 45%), TTK (15/33, 45%), CEP55 (14/33, 42%), OIP5 (12/33, 36%), ATAD2 (11/33, 33%), KIF20B (11/33, 33%), and ODF2 (10/33, 30%), were selected for subsequent analyses.
Mutation analysis of the CTA genes across 33 kinds of TCGA cancers
We downloaded TCGA mutation data of CTA genes from the TCGA data portal. Mutations in the CTA genes in the 33 cancers were determined by R package “maftools”[
23] and are visualized by R package “ComplexHeatmap” in Figure 4. The CTA genes related to UCEC showed the highest number of mutations, followed by those related to SKCM, COAD, STAD, READ, LUSC, and LUAD. Moreover, at the gene level, we discovered that SPAG9, THEG, ACTL8, ARX, CABYR, FANCA, NUF2, and ATAD2 showed higher mutation frequencies. In addition, TGCT, and UVM showed relatively fewer mutations in CTA genes than other cancers. Moreover, mutations in 9 key CTA genes across 33 TCGA cancers were obtained from Cbioportal (Supplementary Figure S4A). The total mutation rates of ATAD2 (8%) in the above 33 cancers were highest, following by NUF2 (4%), PBK (3%), TTK (2.8%), KIF20B (2.8%), ODF2 (1.9%), KIF2C (1.8%), CEP55 (1.3%), and OIP5 (1.9%). Further analysis displayed the specific hot spots of mutations of these key genes (Supplementary Figure S4B). The hot spot mutation K192Sfs*18/Q193Afs*3 of TTK was identified in seventeen patients, involved in three cancer types (COAD, STAD, UCEC), and encodes a truncated protein. The PBK hotspot mutation R75Q/* occurred in four patients with two cancers (UCEC and COAD), and it was detected in the Pkinase domain. The hot spot mutation G141W of OIP5 was detected in four patients with SKCM, and it occurred in Yippee-Mis18 domain. NUF2 hot spot mutation S340L was determined in eight patients with four cancers (SKCM, STAD, UCEC, and COAD), and CEP55 hot spot mutation D239 was found in six patients with three cancers (GBM, LGG, and UCEC). The hot spot mutation A500T of KIF2C was discovered in five patients with two cancers (STAD and UCEC), and it located in Kinesin domain.
Consensus clustering of CTA genes identified different clusters of KIRC, COAD and GBM patients with significantly different prognoses
According to the expression similarity of the CTA genes, we divided KIRC, COAD, and GBM patients into different subgroups by consensus clustering analysis. We found that patients with COAD in the cluster 1 subgroup had significantly higher OS times than patients in the cluster 2 subgroup (Figure 5A). CTA scores were significantly reduced in patients in the cluster 1 group (Figure 5D). Similar results were also found in an external ICGC US testing dataset (Supplementary Figure S5A and D). The overall survival (OS) differences among different subgroups of KIRC patients were also verified in both the TCGA and GSE29609 datasets (Figure 5B, Supplementary Figure S5B). Additionally, CTA scores were significantly decreased in patients in the TCGA cluster 1 group, while there was no significant difference for patients in the GSE29609 dataset (Figure 5D, Supplementary Figure S5D). In GBM patients, we found that patients in the cluster 1 subgroup had a significantly lower OS times than patients in the cluster 2 subgroup in the TCGA dataset (Figure 5C). Moreover, significantly higher overall survival (OS) times were observed in the cluster 2 subgroup than in the other two subgroups in the GSE83300 dataset (Supplementary Figure S5C). Another interesting result was both in training and testing datasets, patients in cluster 2 displayed a longer OS time than patients in cluster 3 (Log-rank test P value = 0.021 and 0.017 respectively). The CTA scores were significantly different for patients of the three subgroups in both the training and testing datasets (Figure 5D, Supplementary Figure S5D). Corresponding with previous findings that lower-CTA scores patients always had a better prognosis than higher-CTA scores patients, the CTA scores in patients of cluster 2 subgroup were lower than patients of cluster 3 subgroup.
Clinical relevance and Pancan-eQTL analysis of nine key CTA genes
Based on the previous results, we recognized nine CTA genes (
PBK,
NUF2,
KIF2C,
TTK,
CEP55,
OIP5,
ATAD2,
KIF20B, and
ODF2), which were correlated with survival and differentially expressed in most tumor types, as potential biomarkers or prognostic indicators. To further clarify the fundamental roles of the CTA genes in tumor development, the relationships between these nine CTA genes and clinical characteristics were evaluated using the MEXPRESS web tool (Figure 6A). The results indicated that the expression of PBK is related to the PSA level in PRAD. Interestingly, a previous study reported that the elevated expression of PBK emerged more frequently in PCa patients with PSA failure[
40], which is consistent with the findings of our study.
TTK expression is correlated with ER status, PR status, and HER2 level in BRCA. This result is similar to that of a previous meta-analysis that indicated that TTK protein was associated with HER2-positive breast cancer patients. Moreover, we also found that the expression of
CEP55 was associated with histological types in most cancers[
41]. Previously published studies on the function of OIP5 and ODF2 are not adequate. Our study discovered an association between ODF2 and IDH1 mutations in LGG and that OIP5 was related to HPV status in HNSC and biochemical recurrence in PRAD.
The genetic variants that regulate gene expression are called expression quantitative trait loci (eQTLs)[
42]. The determination of expression quantitative trait loci (eQTLs) has been highlighted in explaining the genetic alterations in cancers. We also wondered whether the expression of CTA genes was regulated by eQTLs. Therefore, we screened for Cis-eQTLs (SNPs within 1 Mb of the gene transcriptional start site) correlated with the key CTA genes across 33 cancers in TCGA using PancanQTL[
25]. GWAS (genome-wide association study)-related eQTLs across different cancer types were also determined (Figure 6B). We discovered that the expression of
ATAD2 was affected by several eQTLs in KIRP, BLCA, LUAD, OV, PRAD, and SARC. The expression of
KIF2C was influenced by several eQTLs in BRCA, KIRC, KIRC, LGG, SARC, and THCA. No GWAS traits were correlated with ATAD2, and KIF2C. The expression of
TTK was influenced by several eQTLs in BLCA, BRCA, LIHC, LUAD, PAAD, SARC, SKCM, THCA, and PRAD, including 12 eQTLs in BLCA, BRCA, LIHC, LUAD, THCA, and PAAD that could be associated with GWAS. The expression of
PBK was influenced by several eQTLs in BLCA, BRCA, CESC, COAD, LIHC, LUAD, LUSC, OV, PCPG, and other cancer types, including 29 eQTLs in BRCA, BLCA, CESC, LIHC, LUAD, LUSC, OV, and STAD that could be associated with GWAS. eQTLs in BRCA, CESC, COAD, LUAD, LUSC, OV, and THCA affected the expression of
OIP5, including 11 eQTLs in BRCA and THCA that could be associated with GWAS. Interestingly, we found that the expression of
PBK was influenced by eQTLs in BRCA, CESC, COAD, ESCA, HNSC, KIRC, KIRP, and other cancer types. A total of 61 KIF20B eQTLs were associated with GWAS, of which 60 (98.3%) eQTLs were associated with metabolic traits in KIRC and THCA. All CTA genes associated with Cis-eQTLs are presented in Supplementary Table 3.
CNV alteration frequency and methylation analysis of the nine key CTA genes
To determine whether methylation and genetic alterations participated in the regulation of the aberrant expression of CTA genes, the CNV alteration frequency and methylation data of the CTA genes were incorporated into our study from Cbioportal. Generally, there were extensive CNV alterations in these genes across cancer types (Figure 7A). ATAD2 and NUF2 displayed pervasive CNV amplification across cancer types. In contrast, CEP55 and KIF20B showed abundant CNV deletions. Similar prevalent CNV alterations of these key CTA genes also existed across 24 cancer cell lines (Figure 7B). We found that ATAD2 and NUF2 with CNV amplification displayed significantly higher expression in cancer samples than in paired normal samples. Therefore, the perturbed expression of these two genes in cancers may be related to aberrant CNV alterations. However, there was a discrepancy between abundant CNV deletions and the higher expression of CEP55 and KIF20B in most cancer samples.
We then investigated the methylation information of these nine CTA genes. Surprisingly, the results indicated the downregulated methylation of CEP55 in almost all cancer types (Figure 7C). Furthermore, the expression of
CEP55 and other CTA genes was mainly negatively correlated with methylation (Figure 7D). Specifically, CEP55 was upregulated in LUAD patients, and such ectopic expression may be caused by downregulated methylation but not aberrant changes in CNV frequency. In addition, the correlation between methylation and survival (Figure 7E) showed that survival was better for patients with hypermethylation of CEP55 in KIRP, KIRC and ACC. In addition, methylation of CEP55 was associated with ER status and PR status in BRCA. The methylation of TTK, PBK, ODF2, KIF20B and ATAD2 were related to radiation therapy in TGCT. Methylation of ATAD2, NUF2, ODF2 and TTK were correlated with MSI phenotype in COAD (Supplementary Figure S6). Still in Figure 7E, we found that hypermethylation of key CTA genes sometimes correlates with high survival risk in specific cancer types. In fact, one common characteristic for epigenomes of cancer cells is the high level of methylation in promoters of genes[
43]. And the transcriptional silencing of tumor suppressor genes may be caused by hypermethylation of CpG islands, subsequently leading to malignant transformation and cancer progression[
44]. Therefore, we speculated that CTA genes whose hypermethylation were correlated with high survival risk may play an imperative role in inhibiting cancer occurrence and development. For example, ATAD2 in LUSC. LAML and BLCA; TTK in LUAD and CESC.
Figure 7C indicated that ODF2 methylation was upregulated only in liver cancer and prostate cancer, with the highest degree of upregulation observed in liver cancer. Therefore, liver cancer was selected for further experimental validation because it showed the most prominent ODF2 hypermethylation among the analyzed cancer types and a clear negative correlation between ODF2 methylation and expression. ODF2 methylation was negatively associated with ODF2 expression in liver cancer (Figure 7D). In addition, survival analysis suggested that ODF2 methylation status was associated with patient prognosis in specific cancer types, supporting the potential clinical relevance of ODF2 epigenetic regulation. To provide evidence to support methylation analysis above, we conducted a series of experimental research. We first found the expression of ODF2 was reduced in malignant HCC cell line SMMC7721, Huh7, Hep3B, and up-regulated in malignant HCC cell line LM3 when compared with normal liver cell line LO2 (Figure 8A). We speculated that ODF2 expression was regulated by epigenetic modification. To validate this hypothesis, we determined to characterize ODF2 methylation status utilizing the MSP approach. MSP primers were utilized to amplify a rich CpG site area (Figure 8B). The methylation status of ODF2 in these cell lines were detected by MSP, the results show that ODF2 was partial methylated in five HCC cell lines and one normal liver cell line (Figure 8C). Then we treated these cells with 5-Aza-2-deoxycytidine (5-aza-dC), a DNA methylation transferase inhibitor. After 5-aza-dC treatment, increased ODF2 expression was detected in these cell lines (Figure 8D). To further verify the methylation density in the promoter or exon regions, sodium bisulfite sequencing (BSSQ) was performed in LO2, Huh7, and Hep3B cell lines. Dense methylation was examined in the promoter and exon1 region of ODF2 in Huh7 and Hep3B cells, and unmethylation was observed in LO2 cells (Figure 8E). These results corroborated the up-regulated methylation level of ODF2 in liver cancer and uncovered that the expression of ODF2 is regulated by promoter and exon1 region methylation in liver cancer.
Drugs and pathways related to the key CTA genes across cancer types
To further explore the molecular mechanisms by which the key CTA genes are involved in cancer, we studied the relationship between the expression of individual CTA genes and the activity of 10 well-known cancer-related pathways utilizing GSCALite (Figure 9A–B). All nine CTA genes clearly displayed strong inhibition of the hormone ER and RAS/MAPK pathways and great activation of the cell cycle pathway. PBK and KIF2C displayed the strongest activation of the cell cycle pathway. Existing research suggested that the knockdown of
PBK in colorectal carcinoma cells and breast cancer cell lines exhibited G2/M arrest and increased apoptosis, which supported our results[
45]. Subsequently, we obtained a list of small molecules associated with the CTA genes to determine novel treatments using PharmacoDB[
26]. The five most sensitive cancer drugs are highlighted in Figure 9C and Supplementary Table 5. Detailed information on these drugs was collected from the KEGG Drug database (Supplementary Table 5).
Crizotinib, dasatinib, imatinib and nilotinib are antineoplastic drugs that can inhibit the activity of tyrosine kinase. We found that highly expressed ATAD2 is sensitive to imatinib and nilotinib and that the high expression of OIP5 is tolerable to dasatinib and nilotinib, which indicated that the signaling pathways of ATAD2 and OIP5 may be intimately correlated with the signaling pathways moderated by PDGFRA and KIT. Moreover, the optimal use of inhibitors of the MAPK signaling pathway may effectively treat some tumors with high ATAD2 expression. Gemcitabine and methotrexate are drugs that can manipulate cancer by regulating cancer metabolism. Highly expressed TTK was related and sensitive to these two drugs, indicating that TTK may play a role in tumor metabolism. Irinotecan and topotecan are drugs targeting topoisomerase I, and the upregulated expression of ODF2, NUF2 and KIF2C was sensitive to these two drugs; thus, we can speculate that ODF2, NUF2 and KIF2C may interfere with the progression of DNA replication, transcription and repair. In addition, the elevated expression of PBK was tolerable to lenalidomide, which is a TNF-alpha inhibitor, suggesting that PBK may be involved in regulating the NF-kappa B signaling pathway and apoptosis. Most importantly, we found that the increased expression of CEP55 was tolerable to fulvestrant, an estrogen receptor agonist drug that has been widely used to treat estrogen receptor-positive (ER+) breast cancer patients. Our results indicated that fulvestrant may have a poorer performance in treating ER+ breast cancer patients with a high expression of CEP55.
A herb–compound–target interaction network was constructed to delineate the putative pharmacological connections between traditional Chinese medicine and CTA-related targets (Figure 9D, Supplementary Table 6). This integrative network disclosed extensive yet heterogeneous associations among herbal medicines, bioactive compounds, and nine CTA genes. Collectively, these CTA genes were linked to 238 herbs and 20 compounds, underscoring a multifaceted and potentially convergent regulatory architecture. Among these compounds, coumestrol, quercetin, retinoic acid, paclitaxel, glycerol, and polyethylene glycol were among the recurrent or hub-associated components in the network. Among these targets, CEP55 and OIP5 emerged as the most extensively connected nodes, each showing associations with seven compounds; CEP55 was linked to coumestrol, quercetin, retinoic acid, testosterone, progesterone, ethanol, and estradiol, whereas OIP5 was associated with coumestrol, quercetin, testosterone, D-psicose, clomiphene citrate, lactate, and estradiol. By contrast, PBK exhibited a highly restricted interaction profile, being associated solely with O-feruloylquinate, whereas ODF2 was linked only to retinoic acid. NUF2 displayed connections with four compounds, including Zn(II), paclitaxel, phenol, and glycerol, while KIF2C was associated with paclitaxel and polyethylene glycol. TTK also showed relatively broad compound connectivity, being linked to several compounds, such as tetraethylene glycol, isopropanol, malonate, glycerol, and polyethylene glycol. In addition, ATAD2 was connected to isopropanol and tartrate. Notably, KIF20B did not exhibit any prominent compound association in the present network. Taken together, these findings suggest that CTA genes differ substantially in their compound-binding profiles, with CEP55 and OIP5 constituting the principal hubs within the network and thus representing putative key mediators of the multitarget pharmacological effects of traditional Chinese medicine. These results further indicate that the CTA-centered herb–compound–target network has a hub-like structure, with CEP55 and OIP5 serving as priority nodes for future experimental validation, whereas other CTA genes may have more limited compound connectivity.
Prediction of immune signatures by TMB, CTA scores, tumor purity, and neoantigen load in pan-cancer samples
According to our logistic regression model with four predictors (TMB, CTA score, tumor purity, and neoantigen load), we assessed the efficiency of the CTA score in predicting cytolytic activity and immune infiltration levels (immune score) across pan-cancer samples utilizing R package “rms”. The results indicated that three of the four predictors significantly forecasted immune cytolytic activity: CTA score (β = −2.149,
P = 2.0 × 10
−11), tumor purity (β = −29.1,
P = 2.0 × 10
−16), and neoantigen load (β = 0.0004856,
P = 0.00613). Thus, we can conclude that both the CTA score and tumor purity were negative predictors, while neoantigen load was a positive predictor of immune cytolytic activity. There were similar results in predicting immune infiltration levels; the CTA score and tumor purity were negative predictors, with β coefficients of −1.417 (
P = 6.52 × 10
−10) and −12.0 (
P = 2.0 × 10
−16), respectively, and neoantigen load was a positive predictor, with a β coefficient of 0.000898 (
P = 1.89 × 10
−5) (Figure 10A). Significantly, a previous study proved that impaired cytolytic activity was correlated with a lack of clonal neoantigens in lung adenocarcinoma[
46]. Furthermore, the high neoantigen load in microsatellite instability (MSI) tumors facilitated the high infiltration levels of immune effector cells and augmented antitumor immune responses[
47]. These previous findings support the biological plausibility of our logistic regression model.
Prediction of the clinical response to ICIs by the CTA score
Recently, a primary concern of immunotherapy has been the ICI response rate of cancer patients. A considerable number of studies have focused on predicting the immunotherapy response. Multiple factors have been revealed to influence ICI effectiveness, such as the degree of cytotoxic T cell infiltration[
48], TMB or neoantigen load[
38], interferon signaling[
49], and tumor aneuploidy[
50]. Jiang et al.[
51] developed a computational framework and introduced a novel biomarker called TIDE to realize ideal predictions in melanoma and lung cancer. In our selected testing datasets in Methods section, ROC curve analysis was utilized to estimate the accuracy of the CTA score in predicting the clinical response to ICI therapy (Figure 10B and 10D). We found that the CTA score achieved better performance than TMB in the melanoma dataset (GSE91061) and urothelial cancer dataset (Snyder et al. (2017). Moreover, the CTA score outperformed other immunotherapy biomarkers based on gene expression profiling, including the immune score, PD-1, PD-L1, and tumor purity in the urothelial cancer dataset. In these datasets, survival analysis was further employed to investigate the prognostic value of the CTA score. The results indicated that patients with high CTA scores had poorer overall survival than patients with low CTA scores in melanoma (Figure 10C). Moreover, in the urothelial cancer dataset, compared with the higher CTA score group, the lower CTA score group had significantly better overall survival and progression-free times (Figure 10E–F).
Discussion
It has recently become clear that CT antigens are promising targets for immunotherapy due to their testis/tumor-biased expression patterns. In addition, the promising findings that CTAs have implications well beyond tumor progression and can enable the high metastatic capability of tumors suggest that CTAs may not merely be limited to applications in immunotherapy but may provide insight into undiscovered features of tumor biology as well as other novel combination therapies. The aim of this study was to illuminate the systematic molecular characteristics and clinical implications of the CTA genes and to assess the role of the CTA score in predicting ICI response across 33 cancer types. Our primary conclusions are as follows. (1) The CTA score signature we developed can facilitate immune activity estimation and ICI response prediction. (2) The expression profiles of the CTA genes contribute to classification confirmation, and different subgroups in KIRC, COAD, and GBM influenced patient prognosis. (3) The expression of CTA genes may be regulated by multiple mechanisms at the genetic, transcriptional, and posttranscriptional levels. (4) Several drugs may be selected as candidates for combination therapy based on the expression of CTA genes in specific tumor types. Our results proved the effectiveness of using bioinformatic algorithms and computational biology methods to establish novel predictive signatures and elucidate the molecular biology mechanisms of the CTA genes in cancer.
Our study was the first extensive analysis to integrate the expression levels of CTA genes for constructing CTA score-related signatures. The CTA scores in each tumor sample we calculated can reflect the frequency of CTA gene expression from one side. The CTA scores displayed a low distribution in two subtypes of renal cell carcinoma (KIRP and KICH) and a relatively high distribution in uveal melanoma (UVM). This finding was broadly consistent with those of previous studies that classified melanoma as “CT-rich” tumors while classifying renal cell carcinoma as “CT-poor” tumors[
52,
53]. The discrepancy is PRAD, which showed the highest distribution of CTA scores across all cancer types in our results and was classified as a “CT-intermediate” tumor. A possible explanation for this might be that we included newly discovered CTA genes in our analysis. In addition, we investigated the relationships between CTA scores and TMB and neoantigen load. In accordance with the present results, previous studies have demonstrated that high TMB is related to the increased expression of many CTA genes in cancer[
19]. Furthermore, we discovered that the CTA score has prognostic value in multiple cancer types, especially in KIRC and KIRP. Patients with high CTA scores exhibited a shorter survival time. Moreover, we identified a negative correlation between CTA scores and MHC class I molecules in KIRC. It is possible to hypothesize that the inactivation of the immune system, which is probably caused by defective antigen processing and presentation or other mechanisms such as immune evasion, led to the poor prognosis of patients with high CTA scores. Mechanistically, the association between CTA expression and immune escape may be interpreted from several complementary perspectives. First, although CTAs can provide tumor-restricted immunogenic antigens, effective immune recognition requires intact antigen processing and presentation. Therefore, tumors with high CTA expression but reduced MHC expression or impaired antigen presentation may acquire an immune-evasive phenotype, in which antigen abundance is not translated into effective T-cell recognition. Second, CTA-high tumors may represent a more genomically unstable and epigenetically dysregulated state, which could be accompanied by adaptive immune resistance, immune checkpoint activation, or exclusion of cytotoxic immune cells. Third, several CTA-related genes identified in this study were associated with cell-cycle activation, DNA damage response, methylation changes and cancer-related pathways, suggesting that CTA expression may reflect not only tumor antigenicity but also broader malignant programs that reshape the tumor immune microenvironment. These findings may explain why high CTA scores were associated with poorer prognosis in several cancer types despite the theoretical immunogenicity of CTA molecules.
The effectiveness of the immune response largely relied on the identification of cancer cells expressing CTAs by antigen-specific T cells, whereas the co-occurrence of antigen and MHC class I and II molecules should be an essential prerequisite. This highlighted the intimate relationships between the CTA score and antigen processing and presentation in our GSEA results. Most interestingly, although CTA scores were correlated with autophagy related pathway in only 9/33 (27%) TCGA cancer types, the regulation of autophagy by some CTA genes may still be imperative for cancer occurrence and progression in specific tumor types.
Notably, our mutation analysis found that SPAG9, THEG, ACTL8, ARX, CABYR, FANCA, NUF2 and ATAD2 showed higher mutation frequencies across 33 cancer types. Consistent with this finding, several reports have revealed diverse mutational events of ATAD2 in breast cancer and lung cancer[
54,
55].
The expression levels of the CTA genes were heterogeneous across different tumors. Their expressions are modulated by epigenetic systems such as DNA methylation and histone modification. Additionally, nonepigenetic factors such as CNV alteration, transcription factors, signaling pathways and other CTAs can also modify the expression levels of the CTA genes[
3]. Next, we would carefully discuss the possible mechanisms involved in gene expression regulations of nine key CTA genes. The regulation of
CEP55 expression was of great interest to explore. We first found the upregulated expression of
CEP55 across most cancers. Then, we identified abundant CNV deletions and the downregulated methylation of CEP55 in most cancers. In addition, the negative correlation between
CEP55 expression and methylation indicated the imperative roles of methylation modification in regulating
CEP55 expression. Another striking result was that CEP55 exhibited the strongest inhibition of the hormone ER pathway and had a significant correlation with ER and PR status in BRCA. Most significantly, we speculated that the increased expression of
CEP55 in ER
+ breast cancer patients may probably lead to a minor response to an estrogen receptor agonist called fulvestrant. ODF2 encodes one of the major outer dense fiber proteins. The aberrant expression of
ODF2 in liver cancer deserved intensive investigation. In present study, we found the expression of
ODF2 is regulated by promoter and exon1 region methylation in HCC. However, whether the dysregulation of ODF2 could modulate cancer occurrence and progression and whether the methylation of ODF2 could be associated with tumor differentiation were still intriguing questions in future studies. Although our current experiments confirmed the epigenetic regulation of
ODF2 expression in HCC cell lines, the functional effects of ODF2 on tumor proliferation, invasion, metastasis, and
in vivo tumor growth remain to be determined. Further gain- and loss-of-function assays, as well as animal experiments, will be necessary to clarify whether ODF2 directly influences HCC tumor biology. A strong relationship between the frequent amplification and overexpression of
ATAD2 has been reported in the literature[
56]. Knockdown of
ATAD2 suppressed the proliferation of ovarian cancer cells by deactivating mitogen‑activated protein kinase (MAPK) pathways[
57]. Thus, combined with our findings, we hypothesized that the optimal use of inhibitors of the MAPK signaling pathway, such as imatinib and nilotinib, may effectively treat some patients with high ATAD2 expression. The correlation analysis of expression-eQTL-GWAS indicated that the regulatory sites of NUF2 are associated with low-grade glioma. The expression of
NUF2 was a risk indicator for prognosis and was related to IDH1 mutations in LGG. KIF20B and KIF2C are all members of the kinesin family of proteins. The current results of pathway and drug sensitive analysis of KIF2C match those observed in earlier studies indicating that KIF2C was a new player in the DNA damage response and represented a new mechanism that controls DNA double-strand break dynamics and repair[
58].
In parallel with the genomic, epigenetic, and pharmacogenomic analyses, our herb–compound–target network furnished an additional layer of translational evidence by nominating several natural compounds as putative upstream regulators of key CTA genes. Notably, CEP55 and OIP5 were the two most compound-enriched nodes in this network. CEP55 was connected to coumestrol, quercetin, retinoic acid, testosterone, progesterone, ethanol, and estradiol, whereas OIP5 was associated with coumestrol, quercetin, testosterone, D-psicose, clomiphene citrate, lactate, and estradiol. Although these associations are presently predictive, they appear biologically plausible. In breast cancer, CEP55 has been shown to be upregulated and functionally implicated in cell proliferation, migration, and mitotic fitness[
59–
61], whereas
OIP5 is overexpressed in breast tumors and promotes breast cancer progression, at least in part, through the miR-139-5p/NOTCH1 axis[
62,
63]. Particularly noteworthy is that several CEP55-linked compounds identified in our network are steroid- or hormone-related molecules, such as testosterone, progesterone, and estradiol, a pattern that is conceptually concordant with the central role of endocrine signaling in breast cancer biology. Collectively, these observations suggest that network-prioritized herbs harboring CEP55- or OIP5-associated compounds, such as Gao liang jiang, Zi he che, Sang ye, and Niu xi, may merit prioritization as candidate adjuncts for tumors with aberrant
CEP55/OIP5 expression, particularly in hormone-related contexts.
From the perspective of traditional Chinese medicine, this network provides a hypothesis-generating bridge between CTA-related tumor biology and multi-component, multi-target pharmacological regulation. Some of the predicted compounds, such as quercetin, coumestrol, retinoic acid, and paclitaxel, have been reported to exhibit antitumor or tumor-regulatory activities in previous studies. In addition, natural products and herbal medicines may regulate tumor-associated gene methylation through epigenetic mechanisms, including modulation of DNA methyltransferases, histone modification enzymes, and non-coding RNA networks. Therefore, integrating CTA methylation analysis with a herb–compound–target network may help identify molecular targets, candidate compounds, and testable hypotheses for the modernization and mechanistic exploration of traditional Chinese medicine. However, the current study remains limited by its computational prediction strategy and lacks direct experimental evidence that the predicted herbs or compounds regulate CTA methylation or CTA-related tumor biology. Future studies should verify these candidates using target expression assays, methylation assays, tumor functional assays, animal models, and clinically relevant samples.
Consistently, recent transcriptome-based prognostic models have integrated molecular subtyping with immune microenvironment features, including immune infiltration, MSI, TMB, checkpoint expression and HLA profiles, highlighting the value of gene-signature scores for stratifying tumor immunity and prognosis[
64]. In our study, we identified CTA as an effective biomarker for ICI response prediction. Compared with previously reported biomarkers or prediction models, the CTA score may have several potential advantages. First, it is based on a biologically interpretable gene set rather than a purely statistical gene signature, because CTAs are directly linked to tumor-restricted antigenicity and immunotherapy-related mechanisms. Second, the CTA score can be calculated from transcriptomic data and is therefore compatible with large public datasets and clinical RNA-seq cohorts. Third, unlike single biomarkers such as PD-1, PD-L1, TMB or tumor purity, the CTA score integrates the coordinated expression pattern of a group of tumor-associated antigen genes and may capture a broader dimension of tumor immunogenicity. In our testing datasets, the CTA score achieved better performance than TMB in predicting ICI response and also outperformed several expression-based immune indicators in the urothelial cancer dataset. Nevertheless, the predictive performance of the CTA score was not uniformly strong across all datasets, indicating that it should not be regarded as a stand-alone clinical predictor at this stage. Future studies may combine the CTA score with established biomarkers, such as TMB, immune infiltration, checkpoint expression and antigen presentation features, to construct a more robust composite model.
Several limitations should be acknowledged. First, this study was mainly based on retrospective public datasets, and the clinical value of the CTA score requires further validation in prospective cohorts. Second, the immunotherapy cohorts used for model evaluation were limited in sample size and cancer-type coverage. Third, the herb–compound–target network analysis was based on in silico prediction and requires experimental validation. Therefore, these findings should be regarded as hypothesis-generating and require further experimental and clinical confirmation.
In conclusion, our analysis provides a systematic portrait of the molecular features of CTA genes according to multidimensional data of genetics, epigenetics, and pharmacogenomics. We not only discovered that genomic alterations, epigenetic modifications and signaling pathways could regulate the expression of CTA genes but also provided a novel CTA score signature for predicting the immune therapy response. These findings have important implications for predicting immunotherapy response and applying combination therapies. Such a combination therapy could target CTA gene-related pathways or upregulate/downregulate CTA gene expression through potential methods such as DNA methylation inhibitors in specific cancer types. Additionally, our study provides fundamental insights for basic studies to explore the exact role of these CTA genes in carcinogenesis and progression.
The Author(s) 2026. This article is published by Higher Education Press at journal.hep.com.cn.