1 Introduction
Despite intensive research in the past decades, the prognosis for most patients with glioma remains poor [
1]. Detailed characterizations of genomic alterations have not identified subtype-specific pathways of pathogenesis and targetable vulnerabilities [
2]. Tumor initiation is critically dependent on the constellation between the driving genomic alterations and the susceptible cells of origin (COO) in a seed-versus soil manner [
3,
4], and gliomas that originate from different COOs are expected to exhibit distinct biological features and clinical manifestations [
5]. Based on histological similarities with different putative COOs and their presumed levels of differentiation, adult gliomas are traditionally classified as astrocytoma or oligodendroglioma and further stratified into II–IV malignant grades [
6]. Gliomas of the same histological subtypes and grades exhibit extensive biological and clinical heterogeneities [
7]. The integration of canonical genomic alterations, including mutations in isocitrate dehydrogenase 1/2 (
IDH1/2), chromosome 1p19q co-deletion, concomitant gain of chromosome 7, and loss of chromosome 10, has resulted in a histomolecular classification scheme for the diagnosis of adult gliomas into three main categories, namely, IDH mutant astrocytoma (harboring no 1p19q co-deletion) or oligodendroglioma (harboring 1p19q co-deletion), and IDH-wild-type glioblastoma (GBM) [
8,
9]. This classification scheme has improved the risk stratification and objectiveness of glioma diagnosis. However, the neural lineages and differentiation stages involved in these subtypes and the effects of intratumor heterogeneities in copy number variations (CNV) and single nucleotide variations (SNV) on neural developmental pathways remain a crucial unanswered question.
Transcriptome-based classification is expected to reveal the cell states and integrate the effects of genomic and epigenetic alterations on COOs within a tumor [
10], and may thereby provide an alternative platform for the search of subtype-specific vulnerabilities. Based on unbiased transcriptomic analyses, TCGA has identified a classifier of 840 genes and developed a classification scheme that assigns GBMs into four molecular subtypes, namely classical (CL), mesenchymal (MES), neural (NL), and proneural (PN) subtype [
11]. The TCGA subtyping strategy has been widely adopted in the field, but the subtypes are not closely integrated with the known mechanisms of neural development pathway, these subtypes tend to be affected by tumor microenvironment [
12] and unstable during glioma progression [
12,
13]. Multiple subtypes co-exist in the same glioma [
7,
13–
15]. Clinically, subtype-specific prognosis has been questioned [
16,
17].
We hypothesize that key signaling pathways that converge between neural development and glioma genesis might enable the molecular classification of gliomas. Based on the identification of evolutionarily conserved gene modules co-expressed with EGFR (EM, 29 genes) or PDGFRA (PM, 40 genes) and their reciprocal expression pattern in glioma transcriptome, we proposed an EM/PM classification scheme that is applicable to all adult diffuse gliomas of WHO grades II–IV [
18,
19]. The EM and PM members are critically involved in fate decision during gliogenic switch from neural stem cells (NSC) and the specification and maintenance of oligodendrocyte precursor cells (OPC), respectively [
18,
19]. They are frequently amplified or deleted in glioma genome. Independent of the histological subtypes and grades, adult gliomas can be assigned into the EM or PM subtype with distinct overall survival (OS), age of diagnosis, canonical genomic alterations, and correlation to neural lineages [
18,
19]. By using cohorts of bulk samples from our own and from TCGA, we here show that while IDH wild-type adult gliomas correspond to the EM subtype, IDH mutant adult gliomas (astrocytoma and oligodendroglioma) correspond to the PM subtype. By using cohorts of paired longitudinal or multi-region samples, single-cell RNA-seq (SC-RNAseq), and RNAscope analysis, we demonstrate that despite pervasive genomic heterogeneities, the EM/PM subtypes are robustly maintained between the primary and recurrent samples from the same patient and across multiple regions and individual cells of the same glioma. Adult gliomas progress in an EM/PM subtype-specific mode fueled by a gene network spanning elevated activities of cell proliferation, genomic instability, and tumor microenvironment. Malignant cells in the EM gliomas resemble cells in the NSC compartment, whereas malignant cells in the PM glioma resemble cells in the OPC compartment. Our findings of the robust neural cell states in the EM/PM molecular subtypes would provide a conceptual framework to alleviate hitherto conceived biological and clinical heterogeneities and expediate the search for subtype-specific vulnerabilities in gliomas.
2 Materials and methods
2.1 Longitudinal samples from Chinese Glioma Genome Atlas (CGGA)
Paired longitudinal glioma samples and the corresponding patient clinical information were obtained from the CGGA including 70 patients treated at Beijing Tiantan Hospital or Beijing Sanbo Brain Hospital between 2006 and 2018 under ethical permission number IRB KY2013-017-01. RNA-seq based transcriptome data were available for each sample. Data for WES were available for 49 samples with matching germline control. The processed RNA-seq and WES data are available at the CGGA website. Clinical data are presented in Table S2.
2.2 Multi-region sampling
By using the magnetic resonance imaging (MRI) data, we constructed a spatial map of glioma for each patient. Multi-regional sampling was performed based on our spatial map during the resection of 12 patients. Two to seven samples with a diameter of 3.0–10.0 mm were obtained from the tumor core, avoiding regions with necrosis, edema, and bleeding. For tumors with a maximum diameter below 50 mm, between 50–100 mm, and longer than 100 mm, samplings of no more than 4, 4–6, and 6–8 were performed, respectively. The distance between the sampling points was generally greater than or equal to 40 mm. The samples were stored in liquid nitrogen prior to RNA-seq and WES. For each patient, peripheral blood was obtained as the germline control for WES. Our study was approved by the ethical committee of the Affiliated Hospital of Hebei University (HDFY-KL-LL-2018-17). The clinical data for the tumor samples used in this study are presented in Table S4.
2.3 Data sets from previous reports
RNA-seq, WES, and SNP6.0 data from bulk samples and the corresponding clinical information from TCGA samples were obtained from Genomics Data Commons (GDC) portal. The ABSOLUTE purity data of the TCGA samples were downloaded from cBioPortal website. The transcriptome data of the primary/recurrent glioma pairs were obtained from gene expression omnibus (GEO) for GSE4271 [
20], GSE62153 [
21], and GSE42670 [
22], from SRA under SRP074425 [
13], and from European Genome-phenome Archive repository under EGAS00001001800 [
13] and EGAS00001001041 [
23].
The transcriptome matrix and the results of the four GBM subtypes of TCGA were obtained from the supplementary data of Verhaak
et al. [
11]. Multi-regional RNA-seq data of 122 samples from 10 GBM patients were downloaded from IVY GAP. The corresponding SNP6.0 data and clinical annotation information were obtained from the original report [
24]. SC-RNAseq data sets GSE109447 [
25], GSE70630 [
26], GSE84465 [
27], GSE89567 [
28], and GSE131928 [
15] were also downloaded from GEO.
2.4 Identification and analysis of NT identifier
The transcriptome data of 521 glioma and 21 epileptic samples in the Rembrandt data set were downloaded and processed as previously described [
18]. By using Qlucore Omics Explorer 3.5 (Qlucore AB, Lund, Sweden), the transcriptome data from 21 epileptic samples and 65 EM
lowPM
low glioma samples were compared using
t-test at
P = 1.13E–10 and
q = 1.11E–7. The expression patterns of 41 genes enriched in the epileptic samples were further analyzed in all 542 samples. After excluding genes that are also enriched in the EM gliomas based on ANOVA test (
P = 6.20E–9,
q = 6.90E–11), 35 genes enriched in the non-tumor brain tissues (referred to as NT identifier) were identified. GO analysis for the 35 NT identifier was performed with the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.8 [
29,
30]. The GO terms are summarized in Table S1.
2.5 Unsupervised consensus clustering analysis for glioma samples from CGGA or TCGA
Unsupervised consensus clustering, which is implemented in R package ConsensusClusterPlus, was used to identify stable clusters across the glioma transcriptomes from CGGA or TCGA. The FPKM/TPM expression data of NT-identifier, EM, and PM module genes were log2 transformed, and then z-scored prior to ConsensusClusterPlus analysis by using default parameters. The candidate number of clusters (k) was first determined by inspecting consensus matrix plot and the shape of CDF curve together with the delta area under the CDF curve. For the data shown in Fig.1 and Fig.2, clean consensus matrix and the bimodal shape of CDF plot (close to 0 or 1) were observed at k = 3, 4, 5. However, tracking plots showed that at k = 4 or 5, emerging new clusters were within a single cluster of k = 3 with enriched expression of NT identifier. Therefore, k = 3 was chosen.
2.6 Survival analysis
OS was defined as the length of time between the first surgery and death or the last follow-up contact. For longitudinal pairs, PFS was defined as the interval between the first and second surgery. The Kaplan–Meier survival curve and log-rank test were analyzed using Prism 5.0.
2.7 Analysis of copy number alterations and gene mutations in TCGA samples
The CEL files and copy number segment files generated from the SNP6.0 array of gliomas and matching blood samples were downloaded from GDC. The APT-PennCNV-ASCAT workflow was performed to process CEL files. GISTIC2.0 [
31] was used to analyze and visualize the somatic copy number alterations of glioma samples with an amplitude threshold set at ± 0.2. Somatic mutation data generated from WES by using all four mutation detection workflows (SomaticSniper, MuTect2, VarScan2, MuSE) were downloaded from GDC, mutations detected by at least two workflows were included.
2.8 Single sample prediction (SSP) of the EM/PM subtypes in samples from previous reports
For all samples, metagene projection- and nearest centroid-based SSP were first performed. The freely available R-code from The Broad Institute was used for metagene projection analysis [
32]. The 247 core samples from TCGA and 183 core samples from GSE4290 defined in our previous study [
18] were used to build the EM/PM classification model set for the mRNA-seq and array platform, respectively. The module sets for the TCGA four subtypes were obtained from the original report [
11], including 173 glioma samples. The test set was the FPKM/TPM expression matrix based on RNA-seq platform or expression matrix based on microarray platform. Metagene projection was also performed with the corresponding model set to classify the test data set into the TCGA subtypes. Samples with low confidence score (conf. < 0.5) were considered as unidentifiable. The details for the nearest centroid based-SSP are described in our recent report [
33]. Briefly, the mean expression profiles (the centroids) for all module members were calculated for each subgroup. Samples in the test set were assigned to the nearest subtype/centroid by using Spearman correlation. Similarly, results with a
P value > 0.05 were classified as unidentifiable.
For pairs with inconsistent metagene projection and nearest centroid analysis results (Case ID: R083.R, R092.R (EGAS00001001041); R111.I (EGAS00001001800); PRB3780, PRB4094 (GSE4271); GBM003, GBM010, GBM027, GBM039 (GSE62153); PC-NS08-559 (GSE42669)), the ratio between the mean expression of the EM and PM modules was calculated manually, and the EM or PM subtype was assigned to individual samples. The final subtyping results were based on concordant subtyping outcomes in two of the three methods described here.
2.9 Analysis of copy number alterations (CNA) and somatic mutations from the WES data of multi-region samples
Qualified tissue or blood cell DNA samples were randomly disrupted into 150- to 220-bp fragments by using Covaris. Agilent SureSelect Human All Exon V6 kit was used for library construction and capture. The captured library was subjected to WES at a coverage of 100×. The reads were mapped to the reference human genome (UCSC hg19) by using Burrows-Wheeler Aligner (bwa mem) with default parameters. Then, SAMtools and Picard (Broad Institute) were used to sort the reads in terms of coordinates and mark the duplicates. The CNVkit [
34] was used to estimate the status of CNA with default parameters.
To identify somatic mutations from WES data for tumor tissue with matched blood control, we applied three variance-calling software, namely, Varscan2, Mutect2 (implemented in GATK4), and SAVI2. High-confidence somatic mutations were identified using the methods and filtering procedures described by Koboldt
et al. [
35] or based on online tutorials. Somatic mutations annotated as an intergenic or intron variant were not included in the final reports. Somatic mutations detected by at least two variance-calling software above were included in the data shown in Fig. S5B.
2.10 Analyses of SC-RNAseq data
For the clustering analysis of SC-RNAseq data based on the STRT-seq protocol, Seurat V3 single-cell analysis package (v3.1.5) [
36] was used according to the online protocol. Cells with < 10 000 Unique Molecular Identifiers (UMIs) and < 2000 detected genes were filtered out. Sctransform normalization algorithm (SCTransform) was applied to UMIs with default parameters. Top 30 significant components were used for shared nearest-neighbor clustering (FindClusters) and t-SNE visualization (RunTSNE). The clusters were assigned to cell populations based on the differentiation expression of EM/PM modules and previously reported hallmarks of oligodendrocytes, microglia/macrophages, T cells, and cell proliferation as follows:
EM: ACSS3, CDKN2C, DENND2A, DMRTA2, EGFR, ELOVL2, HS3ST3B1, ITGB8, LFNG, NCOA3, NES, NFIA, PDGFA, PMS2P11, POU3F2, PRPF31, RNF180, SALL1, SEC61G, SEMA6D, SHOX2, SNX5, SOCS2, SOX9, TNFRSF19, TRIOBP, UHRF1, VAV3, ZNF558 [
18]
PM: INAVA, C1QL1, CACNG4, CHD7, CSNK1E, DLL3, EIF4EBP2, ETV1, FAM208B, BRINP3, KLRC3, LIX1L, LPHN3, RP11-35N6.1, MARCKS, MEX3A, MMP16, MYT1, NAV1, NLGN1, NOVA1, NXPH1, OLIG1, OLIG2, PATZ1, PCGF2, PDGFRA, POLR2F, RFX7, SAPCD2, SOX4, SOX6, SOX8, TACC2, TMCC1, TSHZ1, ZEB1, ZNF22, ZNF462 [
18,
19]
Astrocytes: ALDOC, AQP4, GFAP [
37], MLC1 [
38]
Oligodendrocytes: MOBP, MBP, MOG [
26], Klk6 [
39]
OPCs: CSPG4, PTPRZ1, PCDH15 [
39]
T cells: CD2, CD3D, CD3E, CD3G [
15]
Microglias: CX3CR1, P2RY12 [
28], TMEM119 [
40]
Macrophages: CD163 [
28], S100A8, S100A9 [
40], AIF1 [
41]
Vascular endothelial cells: SLC38A11, DCN, GJA4 [
27]
Cell proliferation: MKI67, TOP2A, CDK1 [
33]
We also extracted the top 100 most differentially expressed genes as the signature genes for each cell population. Hierarchical clustering was used for the analysis of the expression profiles of the signature genes from each malignant cell population across the murine neural cell types in transcriptome data set GSE9566 [
42] at a
P value of 0.0001.
For the SC-RNAseq data based on the Smart-seq2 protocol, the values of log2(TPM + 1) were used as input to Seurat V3. The top 2000 most variably expressed genes and top 5 most significant components were used for shared nearest-neighbor clustering (FindClusters) and t-SNE visualization (RunTSNE). The clusters were assigned to cell populations as described above.
Subsequently, inferCNV (R package) was used to estimate single-cell CNV status. The cutoff value was set to 0.1 for STRT-seq single cell data and 1 for Smart-seq2.
2.11 Canonical correlation analysis (CCA) between glioma cells and neural stem/progenitor cells in adult mouse V-SVZ
To correlate the single-cell transcriptomes of human gliomas to the transcriptomes of the cell populations of V-SVZ neural stem cell lineages in adult mouse, we first extracted the single-cell transcriptome data of astrocytes, cells with the features of activated neural stem cells, transient amplifying cells and neuroblasts (aNSC + transient amplifying cells (TAC) + NB), OPCs, COPs, and neurons from the single-cell transcriptome data set of adult mouse V-SVZ (GSE109447 [
25]). Transcriptomes of human glioma and mouse V-SVZ cells were normalized and merged using the CCA function of the Seurat V3 R package. From the 2000 most variably expressed genes, Spearman correlation coefficient analysis between the cell populations was performed using the basic “cor” function in R.
2.12 Assessment of the TCGA classifiers across the cell populations inferred from single-cell RNAseq data
By using Seurat V3, the expression profiles of the TCGA classifiers were analyzed across the cell populations inferred from our own data set (GSE117891) and validated in an external data set merged from GSE70630, GSE89567, GSE84465, and GSE131928. For the CL, MES, NL, and PN subtypes, 162, 206, 129, and 178 classifiers were available in the single-cell RNAseq data, respectively.
2.13 CIBERSORTx analysis
Four Smart-seq2 protocol-based SC-RNAseq data sets from 16 IDH mutant gliomas (GSE70630 and GSE89567) and 24 IDH-WT gliomas (GSE84465 and GSE131928) were processed as described in the section “Analyses of SC-RNAseq Data” to define the malignant and non-malignant cell populations. The top 2000 most variably expressed genes and top 20 most significant components were identified.
By using CIBERSORTx [
43], we subsequently built the scRNA-seq signature matrix and enabled batch correction to quantify the frequencies of cell populations shown in Fig. S12A in the bulk transcriptomes of paired samples from CGGA and TCGA.
2.14 Identification of PR0.01 signature and cox regression analysis
To identify the transcriptomic difference between the primary and recurrent gliomas, transcriptome data of 115 primary or recurrent glioma samples from the CGGA longitudinal cohort were used. Under the scheme of EM/PM classification [
18],
t test was carried out by using Qlucore Omics Explorer 3.5 (Qlucore AB, Lund, Sweden). Differentially expressed genes were identified at a threshold at
q ≤ 0.01 and fold change ≥ 1.5. Within the EM subtype, the transcriptome data from 18 primary samples and 21 recurrent samples were compared, and no differentially expressed genes were observed between the primary and recurrent EM samples. Within the PM subtypes, the transcriptome data from 30 primary and 47 recurrent samples were compared, and 349 genes enriched in the recurrent samples were identified and referred to as PR0.01 signature. The members of the PR0.01 signature are listed as follows:
AC005789.11, AC006126.4, ACTA2, ACTG2, ACTN1, ACTN4, ACVRL1, ADAM12, ADAMTS7, ADM, AFAP1L1, AKAP12, ALYREF, ANGPT2, ANXA2, ARHGAP11A, ARHGAP11B, ASF1B, ASPM, AURKA, AURKB, BCL6B, BGN, BICD1, BIRC5, BUB1, BUB1B, C11orf82, C11orf84, CA9, CALR, CALU, CANT1, CAPN5, CAV1, CBR3, CCDC3, CCNA2, CCNB1, CCNB2, CCNF, CD248, CD93, CDC20, CDC42EP5, CDC6, CDCA3, CDCA5, CDCA8, CDH5, CDK1, CDKN2C, CDKN3, CDR2, CDT1, CENPA, CENPF, CENPL, CENPN, CENPW, CEP55, CHAF1A, CHEK1, CHI3L1, CKS2, CLEC11A, CLEC14A, CLIC1, CMTM3, CNIH4, COL15A1, COL18A1, COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, COL5A1, COL5A2, COL6A2, COL6A3, COLGALT1, CTHRC1, DCPS, DDA1, DDX39A, DHRS13, DLGAP5, DNAJB1, DUSP5, DUSP6, DYRK3, E2F1, ECE1, ECSCR, ECT2, EFNB1, EHD4, EIF4EBP1, ELK1, EMP3, EN1, ENG, ESM1, ETS1, EVA1B, EXOC3L1, EXOC3L2, EXOSC3, FADD, FAM111B, FAM114A1, FAM126A, FAM43A, FAM64A, FAM83D, FANCI, FBLIM1, FBLN1, FEN1, FKBP1A, FLT4, FN1, FNDC3B, FOXS1, FRMD8, FSTL1, GATAD2A, GDF15, GGN, GINS4, GJA4, GJC1, GLRX2, GPR124, GPX8, GRWD1, GTPBP8, GTSE1, H2AFZ, HEYL, HIC1, HIST1H4H, HJURP, HK1, HMGA1, HSPA5, HSPA6, HSPG2, HTRA3, IGFBP2, IGFBP4, IKBIP, IQGAP3, ISG20, ISLR, ITGA1, ITGA5, ITGB1, KAZALD1, KCNE3, KIAA0101, KIF11, KIF20A, KIF23, KIF2C, KIF4A, KIFC1, KLHDC8A, KPNA2, LAMB1, LAMC1, LAMC3, LBH, LDHA, LEPRE1, LGALS1, LINC00152, LMAN2, LMNB2, LOXL1, LOXL2, LRR1, LRRC32, LSM4, LUM, LXN, LZTS1, MAP2K3, MCAM, MCM4, MELK, METRNL, MIR4435-1HG, MLTK, MMP9, MMRN2, MPZL2, MSMP, MXRA7, MYBL2, MYH9, MYL9, MYO1B, NANS, NCAPG, NCAPH, NDC80, NDUFA4L2, NEK2, NEK6, NID1, NOX4, NQO1, NRM, NRN1, NRP1, NUF2, NUSAP1, NXPH4, OAF, OIP5, OLFML2A, ORAI1, OSTC, P4HB, PARVB, PBK, PCDH12, PDF, PDK1, PDK3, PDLIM1, PFN1, PGK1, PHLDA2, PKM, PKN3, PLAT, PLAUR, PLK1, PLOD1, PLVAP, PLXDC1, PMEPA1, PMM2, POC1A, PRC1, PRDX4, PRKCDBP, PRSS23, PSMD8, PTBP1, PTPN9, PTRF, PTTG1, PVR, RACGAP1, RAN, RBM14-RBM4, RBPMS, RHOJ, RNASEH2A, RP11-1035H13.3, RP11-438N16.1, RP11-480I12.5, RP11-977G19.10, RP4-673M15.1, RPP40, SEC61G, SEMA3F, SERPINE1, SERPINH1, SGOL1, SGOL2, SH2D3C, SHC1, SHCBP1, SIKE1, SLC25A19, SLC2A3, LC2A6, SMS, SNTB2, SPC24, SPON2, SPRED2, SPRED3, SPRY1, SRPX2, SSR1, STEAP3, STK17A, STK40, STT3A, SUSD2, SYDE1, TACC3, TAGLN, TAGLN2, TCF19, TEAD4, TES, TGFB1I1, TIMP1, TK1, TM4SF18, TMEM2, TMEM45A, TMEM70, TMSB10, TNFAIP8, TNFAIP8L1, TNFRSF10C, TNFRSF12A, TOP2A, TP53, TP53I3, TPM3, TPM4, TPX2, TRIB3, TRIP13, TROAP, TTK, TUBB, TUBB2A, TUBB2B, TUBG1, TXNDC5, UBA6, UBASH3B, UBE2C, UBE2T, UBL4A, UCK2, USB1, UTP23, VASN, VASP, VEGFA, VGF, VKORC1, VWF, WDR4, WDR62, WEE1, XBP1, ZNF367.
GO analysis for the PR0.01 signature was performed with the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.8 [
29,
30]. Functional protein–protein association network analysis was performed using STRING [
44], and visualized using Cytoscape software under the conditions of combined association score > 0.99 and degree > 6 in the STRING database.
Cox regression analysis was performed using the SPSS software, with survival time as the dependent variable. The PR0.01 score for each glioma sample was calculated as the average of normalized expression levels for PR0.01 members. MKI67 score was the value of the normalized log2 expression of Ki-67. The PR0.01 and MKI67 scores were used as variables in multivariate Cox regression analysis.
2.15 Immunohistochemical staining of NeuN in glioma samples
Sections with thickness of 5 μm were prepared from archived formalin-fixed and paraffin-embedded (FFPE) glioma samples analyzed in Fig.1. The sections were deparaffinized and processed in 10 mmol/L citrate buffer (100 °C, 10 min) for antigen repair. After 10 min of incubation in ethanol containing 3% hydrogen peroxide, sections were incubated with anti-NeuN (RBFOX3) antibody (HPA030790, 1:200 dilution) at 4 °C overnight. After 3 rounds of washing for 5 min each time in PBS, the slides were incubated with peroxidase-conjugated sheep anti-rabbit antibodies (Beijing Zhongshan Jinqiao Biotechnology, # PV-6000D) and DAB substrate. Finally, the sections were stained with hematoxylin.
2.16 RNA in situ hybridization analysis
FFPE tissue sections with thickness of 5 µm were obtained from the glioma biobank of Sanbo Brain Hospital, Capital Medical University in Beijing. RNAscope multiplex fluorescent reagent kit v2 (Advanced Cell Technologies, Cat. No. 323100) was used. Slides were baked for 1 h at 60 °C, deparaffinized, and dehydrated with xylene and ethanol. The sections were then pretreated with RNAscope hydrogen peroxide for 10 min at room temperature and RNAscope target retrieval reagent for 15 min at 98 °C. RNAscope protease plus was then applied to the sections for 30 min at 40 °C. The sections were subsequently hybridized with a probe mixture containing ELOVL2-C3 and DLL3-C1 probe (Cat. No. 425420 and 411331, respectively, Advanced Cell Technologies) for 2 h at 40 °C. Finally, amplification steps were performed per instructions and reagents provided in the RNAscope multiplex fluorescent reagent kit v2. Following incubation with DAPI for 30 s at room temperature, the sections were mounted with ProLong Gold Antifade Reagent (Cat. No. P36930, ThermoFisher). The hybridization results were examined using a Zeiss LSM700 confocal microscope.
2.17 Data and material availability
The raw sequencing data sets supporting the current study have not been deposited in a public repository because of institutional ethics restrictions, but the data are available from the corresponding authors on request. The processed data are available in the CGGA website.
All the other data supporting the findings of this study are available within the main article and supplementary files and from the corresponding authors upon reasonable request.
3 Results
3.1 EM/PM subtype-specific clinical and genomic features
Transcriptomic subtyping can be affected by the content of non-tumor (NT) cells in glioma samples [
12]. To decipher the effect of NT brain tissues on the EM/PM subtyping, we identified 35 genes (Fig. S1A and Table S1) that are predominantly enriched in neurons (Fig. S1B) as an NT identifier by using the REMBRANDT data set [
45]. Based on the signatures of the NT identifier and the reciprocal expression pattern between the EM and PM modules, we clustered 325 and 710 adult gliomas of WHO grades II–IV in the CGGA and TCGA data set, respectively. After excluding samples enriched in the NT identifier, adult gliomas were robustly assigned into either EM or PM subtype irrespective of their histological subtypes and grades. The median OS for patients with EM glioma was approximately 1.1 year, whereas patients with PM glioma exhibited a significantly long OS. EM and PM gliomas showed subtype-specific age at diagnosis (Fig.1 and 1B). At the genomic level, only 3.6% (8/220) of the EM gliomas harbored
IDH mutation, while 70.6% harbored concomitant gain of chromosome 7 and loss of chromosome 10. By contrast, 95.4% (354/371) of PM gliomas harbored mutations in
IDH. Consistent with previous reports [
46,
47], 1p19q co-deletion was found mutually exclusive to the occurrence of mutations in
ATRX and
TP53 (Fig.1); PM gliomas without 1p19q co-deletion (currently diagnosed as astrocytoma [
9]) were of poorer prognosis than PM gliomas with 1p19q co-deletion (currently diagnosed as oligodendroglioma [
9]) (Fig.1). Glioma samples with an EM
lowPM
low phenotype may not represent a distinct biological entity as previously considered [
18], and the condition may have been caused by the high contamination with the NT components. Whole exome sequencing (WES) data-based ABSOLUTE analysis [
48] showed low purity in all TCGA samples with an EM
lowPM
low phenotype (Fig. S1C). NeuN (a mature neuron marker) staining showed large portions of NT brain tissues in the CGGA EM
lowPM
low samples (Fig. S1D).
We also performed consensus clustering with the TCGA classifiers [
11]. In both CGGA and TCGA data sets, CL and MES showed poorer prognosis than the NL and PN subtypes (Fig. S2A and S2B). Although each of the TCGA subtypes contained both EM and PM gliomas and samples with an EM
lowPM
low phenotype, both CL and MES subtypes were predominantly correlated to the EM subtype, and the PN subtype was predominantly correlated to the PM subtype. However, the NL subtype was correlated to the PM subtype and EM
lowPM
low samples with high contamination of NT components (Fig. S2C).
These results show that adult diffuse gliomas can be assigned into the EM or PM subtype with distinct clinical features and canonic genomic alterations, and although IDH mutant gliomas can be further separated into astrocytomas or oligodendrogliomas according to histological features and mutually exclusive sets of genomic alterations [
8,
9], these gliomas are nearly exclusively restricted to the PM subtype.
3.2 Temporal stability of the EM/PM subtypes during glioma progression
Glioma progression is associated with extensive CNV and SNV heterogeneities [
7,
13,
23,
46,
49–
51], although canonical genomic alterations, including
IDH mutation, 1p19q co-deletion, and gain of chromosome 7/loss of chromosome 10, are largely preserved [
50,
52]. The effects of genomic heterogeneities on transcriptomic subtypes have not been fully characterized. We assessed the mode of glioma progression under the EM/PM classification scheme. First, the stability of the EM/PM subtypes was assessed in our own longitudinal cohort from 70 patients with astrocytoma or oligodendroglioma of varying grades. By using unsupervised consensus clustering, the EM and PM subtypes were identified in 141 transcriptomes from paired specimens for each patient (Fig.2 and 2B, and Table S2). After excluding 34 pairs containing samples with enriched expression of the NT identifier and one pair containing a sample for which no assignment was possible, the EM subtype was maintained in 14 of the 14 EM pairs examined, and the PM subtype in 21 of the 21 PM pairs examined, whether the recurrence is located distantly or locally to the initial lesion (Fig.2), although glioma recurrence at a distant location of the brain is reported with a high degree of clonal selection and genomic divergence [
23]. With the time to the second surgery as a surrogate of PFS, the EM gliomas progressed rapidly with a median PFS of 10.4 months, while PM gliomas had a median PFS of 50.2 months (
P < 0.0001, Fig.2). The median OS for the EM and PM gliomas were 16.5 and 59.3 months, respectively (
P < 0.0001, Fig.2). With the available samples from two pairs of EM and two pairs of PM gliomas, we used RNAscope analysis to validate the maintenance of EM or PM signatures during glioma progression. Reciprocal expression between ELOVL2, an EM marker involved in supporting EGFR signaling [
18,
53], and DLL3, a PM marker involved in OPC specification [
19,
54], was clearly maintained during glioma progression (Fig.2 and 2G).
We next validated our findings in the transcriptomes of 139 pairs of longitudinal grade III–IV astrocytomas from TCGA or other institutions [
13,
20–
23,
55] (Table S3). Their EM/PM subtypes were individually assigned using both metagene projection [
32] and centroid-based SSP [
33] approaches. In > 80% of the longitudinal sample pairs, the EM/PM subtypes were maintained using both methods. Despite histological diagnoses as grade III astrocytoma or GBM, significantly longer PFS was observed in PM gliomas (Fig. S3A and S3B). Treatment with temozolomide (TMZ) may cause a hypermutation phenotype and promote glioma progression [
13,
49,
50], and none of the nine pairs harboring TMZ-induced hypermutations [
13] changed the EM/PM subtype (Table S3). Notably, across 45 paired longitudinal samples from 20 patients, including 5 patients with 3 longitudinal samples, in the TCGA data set, concordance of the EM/PM assignment was complete, except for the case of TGA-DU-5872 for which no assignment was possible for the sample at the recurrence because of an elevated macrophage signature (Table S3). Therefore, the slightly lower subtype maintenance in previously reported cohorts is unlikely to have been caused by biological difference between the paired tumor samples, or the shortcomings in our data analyses, but more likely by the challenges in tracking longitudinal samples from individual patients. In contrast to the complete maintenance of the EM/PM subtypes, the proneural, neural, classical, and mesenchymal subtypes [
11] were maintained between 45%, 25%, 56%, and 68% of the paired samples, respectively (Fig. S3C).
At the genome level, analysis of WES or SNP6.0 data showed that recurrent EM or PM samples harbored increased CNV burdens as evidenced by extensive chromosomal gains or losses (Fig. S4A) and more SNVs (Fig. S4B), although the gain of chromosome 7/loss of chromosome 10 and co-deletion of 1p19q were preserved in the respective pairs of the primary/recurrent samples. Consistent with the elevated expression of a CDC20 co-expression module (CDC20-M) as a surrogate marker of replication stress and genomic instability [
33], the elevated expression of CDC20-M was observed in the recurrent PM gliomas (Fig. S4C). In comparison with the primary PM gliomas, EM gliomas were generally associated with an elevated cell proliferation and genomic instability, and the primary and recurrent EM gliomas harbored comparable levels of CDC20-M expression (Fig. S4C).
Therefore, despite elevated genomic alterations during glioma progression, the EM/PM subtypes are robustly maintained, and adult gliomas evolve within the EM or PM subtype with a subtype-specific progression mode.
3.3 Spatial stability of the EM/PM subtypes within individual gliomas
We next assessed whether the EM/PM subtypes show intratumor heterogeneity. RNA-seq and WES in MRI-localized multi-regional samples from 12 GBMs were performed [
56]. WES data at a coverage of 100× from 65 samples with matched germline control were available for the assessment of canonical genomic alterations and region-specific SNVs. RNA-seq data from 60 samples enabled both metagene projection- and centroid-based single-sample EM/PM subtyping. The EM or PM subtype was maintained in all regions in each of the 12 GBMs examined (Fig.3 and Table S4). By contrast, 2–3 TCGA subtypes were observed in 8 of 12 GBMs (Fig. S5A). At the genome level,
IDH mutation, 1p19q co-deletion, and gain of chromosome 7 were largely maintained across the regions, although
IDH mutation was not detected in six samples from PM glioma (patient No. 16, Fig.3). Consistent with previous reports [
7,
46], large numbers of region-specific SNVs were detected in 7 of the 9 EM gliomas and all the three PM gliomas analyzed (Fig. S5B). Preserved EM or PM subtype was also found in all multi-regional samples (up to 10 different samples per tumor) in 33 out of 35 GBMs from the Ivy Glioblastoma Atlas Project [
24]. Discordant EM/PM subtypes were observed only in 1 of the 35 GBMs examined (Fig.3 and Table S4). Similar findings were obtained in 15 multisector specimens from 5 gliomas from TCGA [
55] (Table S4). Notably, the EM/PM subtypes were maintained across the temporal and spatial samples in five gliomas from the TCGA database (Fig.3).
Therefore, despite intratumor genomic heterogeneities, the EM/PM subtypes are spatially maintained within individual gliomas.
3.4 EM/PM subtypes mirror distinct compartments in adult neural stem/progenitor cell pool
To assess whether the EM/PM subtypes could be identified in individual cells, we first generated a STRT-seq protocol-based SC-RNAseq data set containing 12 gliomas (GSE117891) [
57], including nine EM gliomas (IDH-wild type (WT) confirmed in eight samples) and three IDH mutant PM gliomas with 1p19q co-deletion (Table S5). In total, 5633 cells with > 10 000 UMIs and at least 2000 detected genes were analyzed using the Seurat platform [
36], resulting in eight cell populations with homogeneous transcriptomic profiles (Fig.4). Non-malignant cell populations, which were referred to as tumor microenvironment (TME) fraction, were identified based on the unambiguous expression of hallmarks of oligodendrocytes (OL), microglia/macrophages (Mf/MG), and T cells (Table S6). All malignant cells in the EM gliomas harbored concomitant gain of chromosome 7 and loss of chromosome 10, and malignant cells in the PM gliomas harbored co-deletion of 1p19q (Fig.4). Further, malignant cells in both EM and PM samples were associated with elevated SOX2 expression (Fig.4). Reciprocal expression pattern between the EM and PM modules was evident across the individual malignant cells; based on the relative expression levels between the EM and PM module, malignant cells were designated as EM+, EM++, EM+++, EM+PM++, and PM+++ sub-populations (Fig.4). The EM+ population, which is composed of 23/258 and 292/978 cells identified in samples GS2 and GS13, respectively, was excluded from further analysis because of lower expression levels in house-keeping genes (e.g., GAPDH and ACTB) and lineage markers, which might have been caused by the lower amounts or poorer quality of mRNA input prior to the sequencing processes. In the three PM gliomas examined (GS8, GS9, and GS14), malignant cells resided nearly exclusively in the PM+++ population (Fig.4). By contrast, cells with the PM+++ signature were not observed in the nine EM gliomas examined. These EM gliomas could be separated into two subgroups containing both EM++ and EM+++ subpopulations, or only EM++ subpopulation (samples GS1, GS2, GS11, and GS12); or containing EM++ and EM+PM++ subpopulations (samples GS3, GS5, GS6, GS7, and GS13; Fig.4 and 4E). The EM+++ subpopulation was only observed in GS1 possibly because of the elevated expression of EGFR as a consequence of focal amplification of the
EGFR locus [
19]. Similar to the malignant cells in the PM gliomas, regulators of OPC specification (PDGFRA, OLIG1, OLIG2, SOX4, SOX6, and SOX8) were concordantly expressed in EM+PM++ cells (Fig.4), although these cells also harbored concomitant gain of chromosome 7 and loss of chromosome 10 (Fig.4).
We further validated our findings in three external SC-RNAseq data sets generated at the Smart-seq2 platform, including GSE70630 and GSE89567 of IDH mutant gliomas with and without 1p19q co-deletion, respectively [
26,
28], and GSE131928 of IDH-WT gliomas [
15] (Table S5). Regardless of the status of 1p19q co-deletion, malignant cells in IDH mutant gliomas were uniformly enriched in the concordant expression of the PM members, the EM members showed a sporadic expression pattern (Figs. S6 and S7). Malignant cells in the IDH-WT gliomas could also be assigned into two populations, including those that are enriched only in the EM signature and those with PM signature on the background of EM expression. In three samples (MGH102, MGH110, and MGH125), the EM+PM++ cells predominated, whereas the 17 remaining samples contained both EM++ and EM+PM++ subpopulations at varying ratios (Fig. S8). Consistent with the identification of the EM+PM++ subpopulation in the SC-RNAseq data, analysis of the bulk transcriptome data for the expression profile of the EM and PM signatures within the EM subtype showed relatively enriched expression of the PM signature in a subset of the EM gliomas and a higher frequency of such EM gliomas harbored focal amplification of
PDGFRA compared with the remaining EM samples (24.6% vs. 14.0%,
P = 0.039,
χ2 test, Fig. S9).
In support of the above findings, three complementary analyses were carried out to map glioma cells to the cell lineages of adult mouse ventricular-subventricular zone (V-SVZ), in which quiescent NSCs become activated (aNSCs) and give rise to neuroblasts (NB), and to a less extent to immature migrating OLs, via rapidly proliferating TAC [
58,
59]. Canonical correlation analysis was first performed to project gene expression dynamics in malignant glioma cell populations identified above into V-SVZ neural stem/progenitor cell lineages, with SC-RNAseq transcriptomes from adult mouse V-SVZ neural stem/progenitor cells as the reference [
25]. The EM++ and EM+++ populations were correlated to V-SVZ astrocyte population (containing NSCs), while the EM+PM++ populations were correlated to aNSC, TAC, and NB. By contrast, the PM+++ population was correlated to the OPCs (Fig.5 and 5B). Next, we extracted the top 100 most differentially expressed genes as the signature genes for each malignant cell population (Table S6). These signature genes showed lineage-restricted expression patterns across neural cell types as analyzed in the transcriptome data set GSE9566 [
42]. Signature genes from the EM++ or EM+++ cell populations (both from IDH-WT gliomas) showed enriched expression in astrocytes and neurons but not in the OL lineage (Fig.5 and 5D). By contrast, signature genes from both EM+PM++ (from IDH-WT gliomas) and PM+++ (from IDH mutant gliomas) cell populations showed enriched expression in OPCs (developing OLs) compared with mature OLs, neurons, and astrocytes (Fig.5 and 5F). Notably, a reciprocal expression pattern between EGFR and PDGFRA was observed in the malignant cell populations in IDH-WT gliomas. In comparison with the EM+PM++ cells, the EM++ and EM+++ cells showed higher EGFR expression but weaker PDGFRA expression. By contrast, the EM+PM++ cells showed higher PDGFRA expression but declined EGFR expression (Fig. S10). These findings coincide with early reports of heterogenous
EGFR and
PDGFRA amplifications in different cell populations in GBM [
7,
60,
61]. Finally, gene ontology (GO) pathway analysis results indicated that the signature genes of the EM++ and EM+++ cell populations were enriched in the regulation of cell growth related to the insulin-like growth factor pathway (e.g., RCN1, IGFBP3, IGFBP2, TNC, IGFBP7, TIMP1, SCG2, and CYR61). The signature genes of the EM+PM++ cell population were predominantly enriched in the cell cycle and chromosome segregation and the regulation of OPC (e.g., DLL3, PDGFRA, SOX4, and SOX6) or neuron (e.g., DCX, CRMP1, ELAVL4, NEUROD1, NXPH1, PAK3, and PCP4) development. The signature genes of the PM+++ cell population were enriched in synaptic transmission, nervous system development, and ion transmembrane transport genes as the top three most significant GO terms and contained intrinsic regulator of OPC development (e.g., MYT1, OLIG1 and SOX8) [
62], signaling molecules in BMP, NOTCH, and WNT pathways (e.g., BMP2; DLL3, HES5, HES6, and HEY1; and CXXC4; respectively), which are critically involved in OPC development [
63,
64] (Table S7).
The expression profiles of the TCGA classifiers were also assessed across the cell populations inferred from single-cell RNAseq data. In both GSE117891 and the external data set (see below, Fig. S12) merged from GSE70630 [
26], GSE89567 [
28], GSE84465 [
27], and GSE131928 [
15], the CL signature was enriched in the EM++ and EM+++ populations, the PN signature in the EM+PM++, PM+++ and normal OPC populations. By contrast, the MES signature was enriched in microglia/macrophages, T cells, and vascular endothelial cells. The expression of NL signature was enriched in astrocytes, oligodendrocytes (OL), and neurons. Unlike the CL and PN signatures, both MES and NL signatures were not enriched in the malignant cell populations (Fig. S11).
Therefore, EM/PM gliomas resemble distinct developmental compartments in the adult neural stem/progenitor cell pool. EM gliomas are committed in the NSC compartment and PM gliomas in the OPC compartment.
3.5 Activities of cell proliferation and microenvironment together predict glioma progression and aggressiveness
After demonstrating the stability of the EM/PM subtypes and their resemblance to distinct compartments in neural stem/progenitor cell pool, we characterized features that distinguish the recurrent gliomas from their primary counterparts. First, we used SC-RNAseq data from 16 IDH mutant gliomas (GSE70630 [
26], GSE89567 [
28]) and 24 IDH-WT gliomas ((GSE84465 [
27], GSE131928 [
15]) to define the malignant and non-malignant cell populations and the top 2000 most variably expressed genes across the populations (Fig. S12). CIBERSORTx [
43] was then used to construct signature matrix and quantify the frequencies of these cell populations in the bulk transcriptomes of paired samples from CGGA and TCGA. Approximately 20% and 5% of the total cells in both primary and recurrent EM gliomas were macrophages and vascular endothelial cells, which were 3- to 5-fold more than their counterparts in the primary PM gliomas, respectively (Fig.6). Notably, the recurrent PM gliomas contained markedly more macrophages and vascular endothelial cells compared with the primary PM samples (Fig.6).
To seek for gene networks potentially crucial for glioma progression, we next compared the transcriptomes between the primary and recurrent samples under the EM/PM classification scheme. Under the conditions tested, significant differences between the transcriptomes of primary and recurrent EM gliomas were not observed. However, a signature of 349 genes (named PR0.01) was enriched in the recurrent PM gliomas (q = 0.01 and fold change = 1.5), which was expressed at high and comparable levels in both primary and recurrent EM gliomas (Fig.6). A large fraction of the PR0.01 members are involved in cell cycle progression and cell division (e.g., AURKA, CDC20, CCNA2, CCNB2, CDK1, CHEK1, DLGAP5, E2F1, MELK, TOP2A, TP53, and WEE1). In parallel, diverse activities of tumor microenvironment activities are embedded in PR0.01. These activities are involved in cell adhesion (e.g., ITGA1, ITGA5, and ITGB1) and cytoskeleton dynamics and cell motility (e.g., ACTA2, ACTG2, ACTN1, ACTN4, MYH9, MYL9, TPM3, and TPM4), extracellular matrix (ECM) organization and remodeling (e.g., collagen molecules, FN1, HSPG2, LMNB2, MMP9, SERPINE1, and SERPINH1), metabolism (e.g., LDHA, PGK1, PKM and XBP1), angiogenesis (e.g., ANGPT2, ANXA2, CAV1, CD93, ESM1, ETS1, FLT4/VEGFR3, NRP1, SHC1, VEGFA, and VWF), and BMP/TGF-β pathways (e.g., ACVRLK1/ALK1, ENG, FSTL1, and TGFB1I1). GO analysis confirmed that the PR0.01 signature predominantly involved cell proliferation and tumor microenvironment networks (Fig.6). This signature was strongly prognostic. Independent of the age, the status of 1p19q co-deletion, and Ki67 staining score, hazard ratios of 11.40 (95% confidence interval (CI): 3.60–36.02) and 6.59 (95% CI: 2.67–16.29) were recorded in IDH mutant/PM gliomas from CGGA and TCGA data set, respectively (Fig.6). We further divided the IDH mutant/PM gliomas according to the extent of PR0.01 expression. In both CGGA and TCGA data set, patients within the fourth quartile of PR0.01 expression showed the worst prognosis (Fig.6). In the TCGA data set, patients within the first quartile of PR0.01 expression showed a median OS of 12 years (Fig.6).
Therefore, cell proliferation and tumor microenvironment together fuel the progression of IDH mutant/PM gliomas, which may also account for the aggressiveness in the IDH-WT/EM gliomas.
4 Discussion
Based on the EM/PM classification scheme, the following findings were obtained: (1) independent of the histological subtypes and grades, the EM and PM molecular subtypes are not affected by the pervasive genomic heterogeneities but robustly preserved during progression and across individual cells; (2) the EM/PM subtypes are distinct in progression mode; (3) although IDH mutant gliomas are currently recognized as clinically distinct astrocytomas and oligodendrogliomas [
8,
9], they are developmentally restricted to the PM subtype; (4) the EM gliomas are committed in the NSC compartment, while PM gliomas are committed in the OPC compartment; and (5) recurrence is not associated with plasticity between the EM and PM molecular subtypes but is driven by the activities of cell proliferation, genomic instability, and tumor microenvironment. These findings constitute a conceptual platform toward the identification of subtype-specific vulnerabilities in adult gliomas.
In comparison with an advanced understanding of the pathogenesis and treatment outcomes in leukemias, understanding of glioma pathogenesis and treatment outcomes for patients with glioma is far behind. In leukemia diagnosis, tumors are characterized along the hematopoietic hierarchy into the myeloid or lymphoid lineage and then the levels of differentiation. This process is essential for clarifying the effects of genomic or epigenetic alterations on a given lineage and differentiation stage and alleviates the complexities and heterogeneities among patients, as leukemias can be classified into biological entities with distinct pathogenesis. Glioma subtyping has traditionally been guided by histological features [
6]. A high degree of complexities may have been generated, because histological subtypes poorly correlate to the true neural lineage and differentiation stage involved in gliomas. Consequently, hitherto considered glioma subtypes may not represent biologically distinct entities; although subtype-specific genomic alterations and prognosis are well characterized [
46,
47], subtype-specific vulnerabilities could not be identified.
Transcriptome-based classification may define biologically distinct molecular subtypes with subtype-specific pathogenesis and clinical manifestations. However, the outcome of transcriptomic classification depends on the purity of the sample and the features of the classifiers. Due to infiltrative feature of glioma growth, glioma samples contain varying extent of NT brain tissues. We therefore developed NT identifier to ascertain the extent of NT brain tissue in each glioma sample. This strategy was used to first identify those samples with high contamination of NT brain tissue. Further, gliomas are an eco-system composed of malignant cells and diverse cell populations of TME, particularly macrophages and vessel endothelial cells. High proportions of TME cells mask the signatures of the EM and PM classifiers. Therefore, those paired primary and recurrent samples with high contamination of NT tissue were excluded from the analysis of temporal stability of the EM/PM subtypes. A minor fraction of samples contained high content of TME cells, which masked the EM and PM signatures and prevented the EM/PM subtyping.
Unbiased differential gene expression analysis has been the predominant approach to identify classifiers in the bulk transcriptome or single-cell transcriptome data of GBM. Although subsets of the classifiers are involved in neural processes [
11,
20,
65], substantial fractions of the classifiers are also involved in cell proliferation [
11,
20], metabolism [
66], or affected by genomic alterations and microenvironment [
12,
15]. These features are expected to be heterogeneous across the individual cells from the same sample and vary substantially between the primary and recurrent samples from the same patient. Our analyses show that the MES signature is enriched in immune cells and vessel endothelial cells, and the NL signature in the mature neural cells. Consequently, transcriptomic subtypes defined by these classifiers are unstable between the primary and recurrent samples from the same patient [
12,
13,
66], different subtypes can co-exist across different regions or cells within the same glioma [
7,
14] and even interconvert [
13,
15].
We have synergized key principles of brain development, glioma pathogenesis, and gene network to identify evolutionarily conserved EM and PM classifiers. In a tissue or tumor sample composed of heterogenous cell populations, numerous studies have demonstrated that genes involved in the same pathway tend to be co-expressed. To capture dysregulated neural developmental programs involved in glioma pathogenesis, EGFR and PDGFRA were used as the seed genes to screen for gene co-expression networks in glioma transcriptome databases [
18]. The selection of EGFR and PDGFRA as the seed genes is based on the findings that EGFR and PDGFRA are frequently amplified, mutated, and overexpressed in gliomas [
60,
67] and that EGFR and PDGFRA are involved in cell fate specification, cell proliferation, and migration in the NSC compartment and OPCs [
62,
68]. Genes tightly co-expressed with EGFR or PDGFRA are selected as the EM and PM classifiers, and they contained critical regulators of gliogenic switch in EM (e.g., SOX9, NFIA, and POU3F2) or oligodendroglial genesis in PM (e.g., CHD7, MYT1, PDGFRA, OLIG2, and SOX4) [
18,
62]; members of the EM and PM are located in recurrently amplified or deleted regions in glioma genome with a gene dosage-dependent expression pattern [
18]. During brain development, subset of the EM members is co-expressed in the developing astrocytes and subset of the PM members in OPCs [
18]. As the EM and PM signatures are robustly maintained between the primary and recurrent gliomas from the same patient and across the regions and single cells from the same sample, these two signatures may indicate the cell states incurring subtype-specific genomic alterations. Therefore, unlike the unstable and plastic subtypes proposed in unbiased analyses [
11,
12,
15,
20,
66], glioma molecular subtypes defined by the reciprocal expression between the EM and PM classifiers are highly robust and stable.
Our findings suggest a model that adult gliomas are locked in two distinct constellations between neural developmental programs and canonical genomic alterations: (1) the EM subtype (IDH-WT gliomas) in the NSC compartment; (2) the PM subtype (IDH mutant gliomas) in the OPC compartment (Fig.7). The EM++, EM+++ and EM+PM++ subpopulations of the EM subtype resemble different differentiation stages within the NSC compartment, which may also explain previous reported intratumoral heterogeneities based on the amplifications of
EGFR and
PDGFRA [
60,
61].
Moreover, the neurodevelopmental framework is a key determinant of glioma cell identity and ensures that such identity is maintained during glioma evolution despite the accumulation of further genetic changes. As demonstrated in adult mouse glioma models, loss of glioma-related tumor suppressors Nf1, Trp53, and Pten can transform NSCs, bipotential progenitor cells and OPCs, but not the late-stage neuronal progenitors, neuroblasts, and differentiated neurons [
69]. In this context, our findings suggest that the rigid constellation between the “seed” (COO) and “soil” (oncogenic genomic alterations) is also maintained during the progression of human gliomas. Therefore, the neurodevelopmental programs embedded in the EM and PM subtypes are likely essential for both the initiation and the maintenance of IDH wild-type gliomas and IDH mutant gliomas, respectively. Once the gliomas are initiated, additional genomic alterations can unlikely alter these developmental programs. The characterization of these developmental programs may uncover new vulnerabilities that are not strictly related to the genetic makeup of gliomas.
Our findings further show that progression of the IDH mutant/PM subtype and the aggressiveness of the IDH wild-type glioblastoma/EM subtype are driven by the activities of cell proliferation and tumor microenvironment embedded in the PR0.01 signature. The PR0.01 signature is highly prognostic for the IDH mutant/PM subtype independent of the age, the status of 1p19q co-deletion, and Ki67 staining score, and it also contains known druggable targets that are involved in cell cycle progression and check points (e.g., AURKA, CDK1, CHEK1, TOP2A, and WEE1), cell adhesion and ECM remodeling (e.g., ITGA5, ITGB1, and MMP9), and angiogenesis activities (e.g., ANGPT2, NRP1, and VEGFA). Therefore, the expression of the PR0.01 signature could potentially be tailed as a biomarker to predict the extent of glioma progression and aggressiveness, stratify patients with glioma to anti-mitotic and anti-angiogenetic treatment strategies, and explore new treatment strategies.
Overwhelming efforts in translating detailed genomic characterizations and other findings into clinical benefits to glioma patients have hitherto failed [
1,
2]. We reason that the roots of these failures are already embedded in the currently conceived strategies of glioma diagnosis. If gliomas cannot be diagnosed into entities with distinct pathogenic mechanisms, etiology studies naturally fail to identify subtype-specific targets, and treatments cannot be directed to patients carrying those vulnerabilities responding to the treatment. According to the 2021 WHO classification guideline, IDH mutant gliomas are diagnosed as astrocytoma (without 1p19q co-deletion) or oligodendroglioma (with 1p19q co-deletion) [
9]. Consequently, astrocytomas and oligodendrogliomas are considered as distinct entities, they would be driven by entity-specific pathogenic mechanisms and should be treated separately. According to the findings present here and in our previous studies [
18,
19], both astrocytomas and oligodendrogliomas are restricted in the PM subtype and are enriched in the OPC signature, suggesting defect(s) in oligodendrocyte lineage development as a common pathogenic pathway for all IDH mutant gliomas.
Currently conceived heterogeneities in glioma biology have also hampered treatment development against glioma. At the genomic level,
EGFR,
PDGFRA, and
MET are found in different cells with the same glioma [
60], and considerable clonal diversities are observed during glioma progression [
13,
23,
46,
49]. At the transcriptomic level, different molecular subtypes co-exist across multiple regions [
7] or single cells [
14,
15,
26] within the same glioma. The current findings show that the cell states defined in the EM/PM molecular subtypes are robustly stable. The EM subtype contains malignant cells resembling the stem/progenitor cell pool spanning activated NSCs, transient amplifying stem cells, and neuroblasts, whereas the PM subtype contains malignant cells that resemble OPC. We believe that the developmental framework anchored in the EM/PM molecular subtypes will largely alleviate the seeming heterogeneities in adult gliomas, and eventually expediate the identification of subtype-specific vulnerabilities and treatment targets.
In summary, the stable EM and PM subtypes capture robust constellation between neural lineages and canonical genomic alterations in adult gliomas, clarification of EM/PM subtype-specific dysregulation of differentiation programs in the NSC or the OPC compartment may lead to identification of subtype-specific vulnerabilities and treatment targets. Further, EM/PM subtyping may greatly improve the prognostication and treatment decisions of adult gliomas by alleviating the heterogeneities and complexity conceived across the histological subtypes. Our approach may also be applicable for the development of ontogeny-based classification scheme and to alleviate complexities across histologically defined subtypes in other solid tumors.