1 Introduction
Acute myeloid leukemia (AML) with t(8;21)(q22;q22) is a malignant hematological proliferative disease characterized by the accumulation of clonal proliferative myeloid cells that are arrested at different hematological stages [
1]. The fusion protein
RUNX1-RUNX1T1, which is produced by the (8;21) translocation, could block myeloid cell differentiation. However, this protein is not sufficient to induce leukemia, and additional genetic alterations are required to address t(8;21) AML pathogenesis [
2]. t(8;21) is one of the most frequent chromosomal translocations in AML and accounts for 10%–15% of adult
de novo AML cases [
3–
6]. Most of the t(8;21) AML cases belong to the AML-M2b subtype in the FAB nomenclature. Although t(8;21) AML is a favorable subtype [
7], it has a high relapse rate in China and some other countries, leading to poor prognosis. Thus, its pathogenesis needs investigation.
In our previous study, CD34
+ myeloblasts are classified into two heterogeneous cell populations, namely, CD34
+CD117
dim and CD34
+CD117
bright (bri), by using the antibody combination of CD34 and CD117 in t(8;21) AML, according to CD117 expression level. These myeloblasts are blocked at different myeloid stages and have distinct characteristics that manifest through several approaches, including single-cell RNA sequencing (scRNA-seq), RNA sequencing (RNA-seq), and morphological and immunophenotypic analyses [
8]. The CD34
+CD117
dim cell population is located at the earliest stage of myeloid differentiation, exhibits high expression of granulocyte-monocyte progenitor markers, and presents a leukemia stem cell gene expression signature. Bulk RNA-seq data in 62 patients with t(8;21) AML revealed that several genes are aberrantly upregulated in patients with different proportions of cell populations. Normal hematopoiesis and cellular differentiation are believed to strictly depend on transcriptional regulation systems. Aberrantly high gene expression in heterogeneous cell populations may lead to the abnormal phenotypes of these cells. Hence, the role of the overexpressed genes in the CD34
+CD117
dim/CD34
+CD117
bri population in t(8;21) AML deserves further investigation. In addition, our previous study revealed that the proportion of the CD34
+CD117
dim population is associated with the disease clinical outcome. Combined with
KIT mutation, which is a well-established prognostic factor in t(8;21) AML, t(8;21) AML could be further stratified into one low-risk subgroup, two intermediate-risk subgroups, and one high-risk subgroup. Whether the highly expressed genes in CD34
+CD117
dim/CD34
+CD117
bri cells are associated with t(8;21) AML prognosis need to be addressed. Therefore, the present work aimed to investigate the gene expression and prognostic value of the overexpressed genes in CD34
+CD117
dim and CD34
+CD117
bri cells and the correlations between gene expression and gene mutations in t(8;21) AML.
2 Materials and methods
2.1 Patient characteristics
This study enrolled 85 patients with de novo t(8;21) AML and 21 healthy donors from Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China. Informed consent was obtained from all the patients and healthy donors in accordance with the Helsinki Declaration II.
The patients were treated with standard first-line “3+7” induction regimens consisting of idarubicin (12 mg/m2 for days 1–3) and Ara-C (100 mg/m2 for days 1–7), followed by 2–4 courses of high-dose cytarabine-based therapy (2 g/m2 every 12 h for days 1–3, a total of six doses) or allogeneic hematopoietic stem cell transplantation (allo-HSCT) as consolidation therapy. Among the 85 patients, 16 (18.82%) received allo-HSCT.
2.2 Gene expression analysis via qRT-PCR
Quantitative reverse transcription PCR (qRT-PCR) was conducted to detect the gene expression levels of LGALS1, ANXA2, EMP3, TRH, PLAC8, and IGLL1 by using TB Green™ Premix Ex TaqII (Tli RNaseH Plus) (TAKARA, Japan) on an ABI ViiA 7 detection system (Life technologies, USA). Positive and negative controls were included in all assays. All values were normalized to GAPDH mRNA levels and presented as ΔCT values. The primer sequences are listed in Table S1.
2.3 Gene expression analysis via 10x Genomics scRNA-seq
Gene expression data and cell annotations of 10x Genomics scRNA-seq were obtained from our previously study [
8]. Gene set enrichment analysis (GSEA) was performed using the clusterProfiler R package [
9]. Gene sets were downloaded from the Molecular Signatures Database (MSigDB, v7.1) of the Broad Institute. Hallmark gene sets (H) and Kyoto Encyclopedia of Genes and Genomes (KEGG) gene sets (C2) were used to perform GSEA in 1000-gene set. R package enrichplot was employed to visualize the GSEA results. Uniform manifold approximation and projection (UMAP) were utilized to reduce dimensionality by using the first 20 principal components for visualization. Clusters were identified using differentially expressed genes with an adjusted
P value (p_val_adj)≤0.05 and an average log fold change (avg_logFC)≥0.2. The
P values of the highly expressed genes in CD34
+CD117
dim and CD34
+CD117
bri populations were calculated using two-sided Wilcoxon test. Heatmap and ggplot2 R packages were employed for the visualization of the gene expression data.
2.4 TCGA cohort analysis
Data from the public database The Cancer Genome Atlas (TCGA-LAML) were used to analyze the gene expression levels of patients with different AML subtypes [
10]. The raw expression matrix of fragments per kilobase per million (FPKM) and clinical data were downloaded using TCGAbiolinks [
11]. log
2(FPKM+1) was applied to evaluate the expression level of each gene.
2.5 Statistical analysis
The clinical characteristics of the two groups were analyzed using χ
2-test or Fisher’s exact test for categorical parameters and Mann–Whitney U test for continuous variables. Overall survival (OS) was measured from the date of disease diagnosis to the date of death (failure) or the last follow-up time (censored). A Cox regression model was employed for the univariate and multivariate analyses of OS. Kaplan–Meier method was applied to estimate the probabilities of OS, and the log-rank test was utilized to compare the
P values. Major molecular remission (MMR) was based on the
RUNX1-RUNX1T1 transcript level as previously described [
12–
14]. Statistical analyses were performed with SPSS 25.0 (IBM) and GraphPad Prism 6.0.
3 Results
3.1 Gene expression in t(8;21) AML
The previous scRNA-seq data of nine patients with t(8;21) AML [
8] were analyzed. GSEA results revealed that CD34
+CD117
dim cells had highly expressed genes enriched in the epithelia-mesenchymal transition (EMT), apical junction, IL6-JAK-STAT3 signaling, and TNFA signaling via the NF-κB pathways. The genes associated with DNA repair and G2M checkpoint were activated in the CD34
+CD117
bri population (Fig. 1A). Given that the genes related to cell adhesion and EMT pathways were highly expressed in the CD34
+CD117
dim population, the genes participating in these pathways were further evaluated. The results showed that
CRIP1,
LGALS1, and
EMP3 associated with cell adhesion and EMT were overexpressed in the CD34
+CD117
dim population, with significant
P values (
P<0.0001, Fig. 1B). Thus, these genes were selected as candidate genes in the following study. The highly expressed genes in CD34
+CD117
bri cells with significant
P values were also analyzed.
PLAC8,
TRH, and
IGLL1 (Fig. 1B) were selected for subsequent study because they could be specifically quantified by the qRT-PCR primers. The RNA-seq data obtained from the bone marrow mononuclear cell (BMMC) samples of 62 patients with t(8;21) AML also confirmed that
CRIP1,
LGALS1, and
EMP3 were overexpressed in patients with a high proportion of CD34
+CD117
dim cells, and
PLAC8, TRH, and
IGLL1 were highly expressed in those with a high proportion of CD34
+CD117
bri cells (Fig. 1C). In our previous study, CD34
+CD117
dim ratio was found to be associated with clinical outcome, and patients with a high proportion of CD34
+CD117
dim or CD34
+CD117
bri populations present poor or favorable clinical outcomes, respectively. Hence, whether the highly expressed genes in the CD34
+CD117
dim (
LGALS1, CRIP1, and
EMP3) or CD34
+CD117
bri (
TRH, PLAC8, and
IGLL1) population may be related to inferior or favorable prognoses, respectively, in t(8;21) AML was determined in the present work.
3.2 Gene expression in AML
The expression levels of LGALS1, CRIP1, EMP3, TRH, PLAC8, and IGLL1 in BMMC samples from different AML subtypes, including M0–M7 in the FAB nomenclature, were compared using the bulk RNA-seq data from the TCGA AML project. TRH and IGLL1 levels in t(8;21) AML were significantly higher than their mean expression in other AML subtypes, whereas LGALS1 expression in patients with t(8;21) AML was lower than its mean value in other AML subtypes. However, all AML subtypes presented high expression levels for all the six genes, with most of their log2(FPKM+1) value exceeding 10 (Fig. 2).
The expression levels of LGALS1, EMP3, PLAC8, TRH, IGLL1, and CRIP1 in BMMC samples from 85 patients with t(8;21) AML and 21 healthy donors from our cohort were compared using qRT-PCR. The clinical characteristics of these 85 patients are summarized in Table 1. The expression levels of LGALS1, CRIP1, TRH, and IGLL1 in patients with t(8;21) AML were significantly higher than those in the healthy donors (Fig. 3A). Although EMP3 and PLAC8 levels were higher in patients with t(8;21) AML than in normal controls, the differences did not reach statistical significance (Fig. 3A).
3.3 Clinical features and outcomes related to gene expression in AML
The clinical relevance of the expression levels of
LGALS1,
CRIP1,
EMP3,
PLAC8,
TRH, and
IGLL1 were further analyzed. For the 85 patients with t(8;21) AML in our cohort, the median relative expression levels of
LGALS1,
CRIP1,
EMP3,
PLAC8,
TRH, and
IGLL1 to
GAPDH were 0.1358, 0.0434, 0.082, 0.1039, 0.0691, and 0.06, respectively. The patients were further classified into high and low groups according to the median values of these genes. A nearly bimodal distribution was achieved (Fig. 3B); thus, the median values were defined as the cutoff values. The clinical features of the high and low groups for each gene were compared (Table 2). Patients with high
CRIP1 expression harbored more marrow blasts (
P = 0.004) than those with low expression. They also had more cytogenetic abnormalities other than t(8;21) (
P = 0.016) and tended to lose the sex chromosome (
P = 0.001) compared with those with low
CRIP1 expression. The expression levels of
CRIP1 in patients with other abnormal karyotypes were higher than those in patients with t(8;21), except for those with t(15;17) in TCGA database (Fig. S1). This finding confirmed our result from another aspect. The high
EMP3 expression group had high blood platelet counts at diagnosis (
P = 0.047).
FLT3-ITD mutation and
PML-RARA fusion transcript are correlated with low blood platelet counts [
15–
17], leading to poor prognosis in acute promyelocytic leukemia. However, whether blood platelet count is associated with clinical outcome in t(8;21) AML remains unknown [
12,
18–
20]. Patients with high
IGLL1 expression tended to have a high median age (
P = 0.001) and more myeloblasts (
P = 0.005), whereas those with low
PLAC8 expression tended to have
KIT gene mutations (
P = 0.017). No significant difference in gender, median white blood count, median hemoglobin, AM cell proportion, induction cycles, and relapse rate was observed among the different expression groups (Table 2).
The clinical outcomes of 85 patients with t(8;21) AML in the cohort were further analyzed. Patients with high expression levels of LGALS1, EMP3, and CRIP1 presented significantly worse OS than those with low gene expression levels (P = 0.0146, P = 0.0007, and P = 0.0354, respectively). The high TRH and PLAC8 expression groups showed more favorable prognosis than the low expression groups (P = 0.0126 and P = 0.0059, respectively, Fig. 4). The gene expression combination could further stratify these patients into two groups with significant differences (Fig. S2). Poor relapse-free survival was observed for patients with high LGALS1 expression and those with low TRH and PLAC8 expression (P = 0.0323, P = 0.0296, and P = 0.006, respectively, Fig. S3). Data from TCGA AML project confirmed these results and showed that patients with high expression levels of LGALS1, EMP3, and CRIP1 had poor clinical outcome (P = 0.00026, P = 0.033, and P = 0.08, respectively), whereas those with high expression levels of TRH and PLAC8 presented favorable outcomes (P = 0.00013 and P = 0.0014, respectively, Fig. S4). Given the significantly higher LGALS1 expression in patients with other AML subtypes than in patients with t(8;21) AML, the relationship of LGALS1 expression and prognosis in patients with other AML subtypes was also examined by using the data from TCGA database. The result showed that patients with high LGALS1 expression presented significantly poor OS in other AML subtypes (P = 0.00017, Fig. S5).
Univariate and multivariate analyses were performed to assess whether LGALS1, CRIP1, EMP3, PLAC8, TRH, and IGLL1 expression levels are independent prognostic factors associated with OS (Table 3). The results of univariate analysis showed the association of OS with CD19 positive rate (HR, 0.400; 95% CI, 0.191−0.836; P = 0.015), CD34+CD117dim proportion (HR, 2.444; 95% CI, 1.059−5.639; P = 0.036), KIT mutation (HR, 3.340; 95% CI, 1.602−6.964; P = 0.001), MRD status (HR, 0.356; 95% CI, 0.147−0.861; P = 0.022), LGALS1 expression (HR, 2.579; 95% CI, 1.165−5.707; P = 0.019), CRIP1 expression (HR, 2.188; 95% CI, 1.027−4.659; P = 0.042), EMP3 expression (HR, 3.710; 95% CI, 1.637−8.412; P = 0.002), PLAC8 expression (HR, 0.341; 95% CI, 0.153−0.756; P = 0.008), and TRH expression (HR, 0.379; 95% CI, 0.173−0.830; P = 0.015). The results of multivariate analysis revealed that KIT mutation (HR, 3.926; 95% CI, 1.635−9.427; P = 0.002), MRD status (HR, 0.286; 95% CI, 0.108−0.758; P = 0.012), CRIP1 expression (HR, 2.651; 95% CI, 1.017−6.907; P = 0.046), and TRH expression (HR, 0.237; 95% CI, 0.093−0.602; P = 0.002) remained independent prognostic factors for OS, thus suggesting their important roles. However, no statistically significant difference was found in the CD34+CD117dim proportion and expression of CRIP1 and PLAC8. This finding could be attributed to the small sample size. Further studies are thus required to confirm these results.
3.4 Correlation between gene expression and mutation in gene families
The relationship between gene expression and mutation in gene families obtained from the previous 62 RNA-seq data was analyzed. Patients with high PLAC8 expression harbored fewer gene mutations in the RTK/Ras family (KIT, NRAS, FLT3, JAK2, and KRAS) than those with low expression (40.6% vs. 66.7%, P = 0.046). Patients with high IGLL1 expression had fewer gene mutations in epigenetic modifiers (ASXL2, TET2, ARID2, ASXL1, KDM6A, KMT2D, EZH2, IDH2, and JMJD1C) than those with low expression levels (26.7% vs. 55.2%, P = 0.026). However, no significant difference in the RTK/Ras family, epigenetic modifiers, transcriptional factors, cohesion complex or signaling pathways for LGALS1, EMP3, CRIP1, and TRH was observed between the high and low expression groups (Table 4).
4 Discussion
In this study, scRNA-seq and bulk RNA-seq data from our previous work [
8] were used and qRT-PCR was performed to investigate the gene expression levels of the highly expressed genes in CD34
+CD117
dim and CD34
+CD117
bri cell populations in t(8;21) AML. The clinical characteristics of these genes and the related clinical outcomes were also analyzed.
scRNA-seq data revealed that
EMP3,
CRIP1, and
LGALS1 were highly expressed in CD34
+CD117
dim cells.
EMP3 belongs to the peripheral myelin protein 22-kDa (PMP22) gene family, is overexpressed in breast cancer, and is related to high
HER-2 expression.
HER-2 and
EMP3 co-expression is the most important indicator of clinical outcome for patients with urothelial carcinoma. Besides, high
EMP3 expression is related to poor prognosis in several tumors [
21]. In this work, patients with high
EMP3 expression tended to have relatively high blood platelet counts at diagnosis. This finding was reminiscent of a recent report, which showed that early platelet recovery time was an independent prognostic factor in AML [
22], and thus suggested the importance of monitoring the recovery of platelet throughout the therapy. Thus, the relationship between gene expression and platelet recovery time deserves further study.
CRIP1 functions as an intracellular zinc transport protein and acts as an oncogene in regulating migration and invasion through excessive zinc-induced EMT in colorectal cancer [
23]. This protein is an independent prognostic marker with remarkable predictive power in several solid tumors [
24–
26].
LGALS1 (galectin 1) is a glycan that binds β-galactoside and a wide array of complex carbohydrates. This protein is upregulated in several types of cancer cells, and its abnormal expression is linked to the development, progression, and metastasis of cancers [
27]. Human Cell Landscape database showed that
LGALS1 and
EMP3 are expressed in stromal cell populations (Fig. S6). All the above genes are involved in EMT that is associated with several pathways, such as TGF-β, NF-κB, and PI3K-AKT signaling pathways [
28–
30]. During this process, epithelial cells acquire migratory and stem-like traits that lead them to be tumorigenic and malignant [
31]. The clinical importance of EMT in solid tumors is well acknowledged, and EMT signatures could be used as indicators of poor clinical outcomes in various solid tumors [
28,
30–
35]. However, the role of EMT in hematological malignancies is poorly understood. A study using an inducible transgenic mouse model of MLL-AF9-driven leukemia reported that EMT-associated genes are linked to poor prognosis in AML [
36]. The present work confirmed this observation and showed that high
EMP3,
CRIP1, and
LGALS1 expression was significantly associated with poor OS in t(8;21) AML. The external cohort of the TCGA AML project also confirmed our findings. However, these results should be validated in a large sample size. Further investigations are also needed to confirm the function of these genes in t(8;21) AML.
TRH and
IGLL1 were highly expressed in the CD34
+CD117
bri population, and their expression levels in t(8;21) AML were significantly higher than those in other AML subtypes. This finding was consistent with previous studies [
37,
38] showing that
TRH expression is significantly higher in t(8;21) AML than in inv(16) AML. However, the functions of these two genes in AML remain poorly understood. One recent scRNA-seq study on primary glioblastoma showed that high
TRH expression is inversely correlated with tumor invasion and may be related to cell proliferation [
39]. This result was consistent with our finding that cell cycle pathways were activated in the CD34
+CD117
bri population. Another work on epigenetic and genetic heterogeneity in AML also revealed that high
IGLL1 expression is correlated with cell cycle and DNA repair [
40]. Given that patients with a high proportion of CD34
+CD117
bri cells presented favorable clinical outcomes, those with high expression levels of these two genes may have a good prognosis. t(8;21) AML is considered a favorable AML subtype. Whether high
TRH and
IGLL1 expression could be used as a candidate prognostic marker in AML deserves further study.
Correlations between gene expression and genetic alterations were also analyzed. Although the result was preliminary, patients with low
PLAC8 expression tended to have more gene mutations in the RTK/Ras family.
PLAC8 is a conserved cysteine-rich protein that plays an important role in normal cellular processes and various diseases, and this action is highly reliant on cellular and physiological contexts [
41,
42]. Although
PLAC8 overexpression is associated with poor prognosis in several solid tumors [
41–
43], it was found to be related to favorable clinical outcomes in t(8;21) AML in this study. Whether the prognostic effect of high
PLAC8 expression is affected by gene mutations in the RTK/Ras family or whether these genes could regulate
PLAC8 expression needs further investigation.
5 Summary
This study is the continuation of our previous work. A correlation was found between the clinical outcome and the highly expressed genes identified in different cell populations. The results suggested that cell adhesion and EMT might play important roles in t(8;21) AML pathogenesis. Identifying new biomarkers may lead to the development of new tools for future tailored therapy in t(8;21) AML.