1. Introduction
Monoclonal gammopathy of undetermined significance (MGUS) is defined by an abnormal monoclonal immunoglobulin or its light chain fragments detected in serum and/or urine, accompanied by a clonal plasma cell or lymphoplasmacytic cell population comprising less than 10% of the total bone marrow cellular content. This condition primarily manifests in individuals exceeding 50 years of age, with a pronounced predilection for males [
1,
2]. Although initially presenting without clinical manifestations, MGUS patients encounter a persistent risk for transformation into more aggressive plasma cell neoplasms, notably multiple myeloma (MM). Annual deterioration occurs in approximately 1% of MGUS patients, whereas roughly 10% of smoldering multiple myeloma (SMM) cases advance to MM within a five-year post-diagnostic period. Notably, 18% of MGUS patients demonstrate elevated progression risk over a 20-year timeframe [
3]. The clinically silent presentation of MGUS frequently results in diagnostic delays, thereby complicating preventive and therapeutic interventions [
4,
5].
MM emerges as the second most prevalent hematologic malignancy, characterized by dysregulated proliferation of abnormal clonal plasma cells within the bone marrow. The pathological expansion generates profound clinical manifestations, encompassing osteolytic lesions, renal dysfunction, anemia, and hypercalcemia [
6]. The progression from MGUS to MM represents a sophisticated transformation involving substantial genetic and microenvironmental modifications. Fundamental genetic alterations, encompassing chromosomal numerical variations and specific translocation events, serve critical functions in driving malignant plasma cell clonal expansion. Concurrently, the bone marrow microenvironment undergoes significant restructuring to facilitate malignant progression, characterized by enhanced vascular development and increased stromal cell adaptability [
7]. Understanding these clinical and biological markers remains crucial for predicting MGUS to MM progression, enabling strategic patient stratification and targeted monitoring with early therapeutic interventions. Despite extensive research, the precise molecular mechanisms governing MGUS to MM transformation remain incompletely understood, emphasizing the continued necessity for comprehensive investigative approaches.
Recent advancements in computational biology and genomic array methodologies have increasingly become pivotal instruments within biomedical research [
8]. Comprehensive evaluations of differentially expressed genes (DEGs) and their corresponding protein expressions in individuals with MM potentially elucidate the molecular mechanisms underlying the transition from MGUS to MM. These analytical approaches demonstrate significant potential for identifying novel diagnostic indicators and therapeutic targets. Utilizing sophisticated bioinformatics analytical strategies, this research aimed to delineate common molecular pathways, fundamental interconnected genes, and critical genetic elements associated with MGUS and MM progression. The primary research intent focused on exploring shared pathological mechanisms and potential therapeutic interventions, ultimately seeking to improve diagnostic precision and clinical management strategies for patients experiencing these hematological conditions.
2. Materials and Methods
2.1 Data Sources
2.1.1 Bulk Transcriptome Data
Transcriptome sequencing datasets pertaining to MGUS and MM were procured from the Gene Expression Omnibus (GEO) database [
9]. From the comprehensive GSE6477 dataset (GPL96 platform) [
10], which originally contains 162 samples across various disease stages, a specific subset aligned with the study aims was selected. This subset captures bone marrow transcriptome information from 15 samples explicitly labeled as healthy donors, 22 MGUS patients, and 73 newly diagnosed MM patients, while smoldering and relapsed cases were excluded to focus on primary disease progression. Similarly, from the 78 total samples available in the GSE5900 [
11] dataset (GPL570 platform) data gathered from 22 healthy subjects and 44 MGUS patients were utilized, with SMM cases being excluded. These datasets were applied for differential gene expression analysis and weighted gene co-expression network analysis (WGCNA). The GSE13633dataset (GPL2714 platform) [
12], which encompasses transcriptomic and survival data from MM patients, was utilized to develop the prognostic risk model. Crucial prognostic genes were determined via univariate Cox and least absolute shrinkage and selection operator (LASSO) regression analyses through integration of information from GSE136337 and The Cancer Genome Atlas - multiple myeloma (TCGA-MM, 787 bone marrow samples). The robustness and predictive capacity of the model were subsequently evaluated across three independent cohorts: TCGA-MM, GSE4581 (414 MM, GPL570 platform), and GSE57317 (55 MM, GPL570 platform) [
13].
Each GEO dataset was examined independently within its corresponding platform, without combining raw data across GPL96, GPL570, or GPL2714 platforms. For each dataset, raw expression data were subjected to background correction and quantile normalization, followed by log2 transformation. All data underwent quality control and standardization procedures to ensure internal consistency and comparability across analyses.
2.1.2 Identification of DEGs Among the MGUS Group, the MM Group, and the Healthy Group
Initially, the ‘limma’ package [
14] was employed to determine DEGs within the MGUS and MM cohorts versus control specimens. The selection parameters for DEGs were established as AveExpr
5,
logFC
1, and
p 0.05.
2.2 WGCNA for Coexpression Network Construction
Gene module identification concerning MGUS and MM progression was executed through independent WGCNA analyses on the GSE6477 and GSE5900 datasets. The methodology of WGCNA demonstrates exceptional proficiency in processing large-scale datasets, enabling precise identification of gene clusters substantially linked to pathological progression via sophisticated clustering and modularization methodologies [
15]. Quality control and normalization procedures were first applied to ensure data consistency. Crucially, to guarantee statistical robustness, the appropriate soft-thresholding power (
) was determined based on the scale-free topology criterion (scale-free topology fit index R
2 0.8). Utilizing this parameter, a weighted adjacency matrix was developed for each dataset and transformed into a topological overlap matrix. Module identification was achieved through hierarchical clustering coupled with dynamic tree cutting, whereby genes exhibiting similar expression profiles and potentially shared biological functions were grouped together. Finally, module–trait associations were assessed to identify modules potentially relevant to disease progression, establishing a foundation for subsequent candidate gene selection.
2.3 Venn Diagram Plotting
By integrating transcriptomic data and employing gene co-expression network analytical techniques, a comprehensive strategy was utilized to delineate pivotal genes linked to MM and MGUS progression. The investigative approach encompassed executing set-based operations across four distinct gene collections: DEGs and WGCNA module genes for both MM and MGUS. Intersection analyses were subsequently conducted, initially between MM-related DEGs and MM-related WGCNA module genes (designated as set A), and subsequently between MGUS-related DEGs and MGUS-related WGCNA module genes (designated as set B). The ultimate candidate gene collection (referred to as set C) emerged from the convergence of sets A and B. This methodological strategy was graphically represented through a Venn diagram, wherein individual closed curves depicted various gene collections, with intersecting regions emphasizing the critical genes of significant interest.
2.4 Clinical Significance of DEGs and Development of Prognostic Model
The clinical relevance of pivotal DEGs was investigated through an initial evaluation of their diagnostic potential utilizing receiver operating characteristic (ROC) analysis within the GSE6477 dataset. Correlations between gene expression patterns, patient survival outcomes, disease staging, and chromosomal aberrations were subsequently examined. Within the GSE136337 dataset, univariate Cox regression analysis identified genes demonstrating significant links to MM prognosis, while LASSO regression was employed to refine these candidates into the most informative prognostic markers. A risk score model (MGUS-related risk score, MGUSscore) was constructed based on this refined gene set, incorporating two hub genes, DAP3 and UBE2S. Patients were classified into high- and low-risk cohorts based on the median risk score, with overall survival (OS) differences evaluated through Kaplan–Meier (K–M) analysis. The prognostic value of the MGUSscore was evaluated in conjunction with clinical variables (including age, gender, International Staging System [ISS] stage, Lactate Dehydrogenase (LDH), and cytogenetics) through multivariable Cox regression analysis. Predictive performance was evaluated through time-dependent ROC curve analysis, the Concordance index (C-index), and calibration curves. A prognostic nomogram for predicting 1-, 3-, and 5-year OS was constructed utilizing the rms R package (version 6.7-0; Frank E. Harrell Jr., Vanderbilt University School of Medicine, Nashville, TN, USA). The TCGA-MM cohort was jointly analyzed with GSE136337 to facilitate core prognostic gene identification and to further evaluate the robustness and predictive performance of the MGUSscore. The detailed clinical characteristics of the GSE136337 and TCGA-MM cohorts are summarized in Table
1. External validation was subsequently performed across two independent cohorts, GSE4581 and GSE57317.
2.5 Functional Enrichment Analysis
DEGs between the two cohorts were determined utilizing the ‘limma’ package, with selection criteria of AveExpr
5,
logFC
0.5, and
p 0.05 being applied. Subsequent analyses involved Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment investigations to elucidate potential biological mechanisms and pathways. The GO enrichment approach annotated hub genes’ biological characteristics across three sub-ontological domains: molecular functions, cellular components, and biological processes. Complementarily, KEGG pathway analysis was implemented to identify potential signaling mechanisms linked to the target genes. The application of the ‘clusterProfiler’ [
16] package within R facilitated comprehensive GO/KEGG enrichment investigations of intersecting genes, potentially elucidating fundamental disease progression pathways. MM subjects were systematically split into distinct expression clusters per the median MGUSscore threshold. Subsequently, gene set enrichment analysis (GSEA) executed across these stratified subgroups offered nuanced insights into gene expression variation implications.
2.6 Immune-Related Analysis
Progressive advancements in disease research have rendered the substantial impact of immune responses on disease progression increasingly apparent. Immune scores, stromal scores, and ESTIMATE scores were computed for high and low-risk cohorts utilizing the ESTIMATE algorithm. The proportions of immune cell infiltration were determined through the ‘CIBERSORT’ [
17] analytical tool. Subsequently, linear regression analysis was implemented to elucidate the associations between immune cell infiltration patterns and risk scores.
2.7 Drug Sensitivity Analysis
Utilizing data procured from the Genomics of Drug Sensitivity in Cancer (GDSC) database, recognized as the most comprehensive pharmacogenomics resource available, chemotherapy sensitivity predictions were generated for individual tumor specimens through implementation of the ‘oncoPredict’ [
18] R package. The half-maximal inhibitory concentration (IC
50) values for respective pharmaceutical agents were determined via regression analysis, with predictive accuracy being assessed through 10-fold cross-validation methodology employing the GDSC training dataset.
2.8 Single-Cell Sequencing Analysis
Single-cell transcriptome datasets were procured from the GEO database, specifically the GSE176131 [
19] dataset. The repository comprised data collected from 2 healthy control subjects and 9 MM patients, facilitating comprehensive single-cell correlation investigations. Single-cell RNA-seq data processing was implemented through the ‘Seurat’ [
20] R package, with cellular expression matrix normalization conducted via the ‘LogNormalize’ methodology. Dimensionality reduction was achieved by principal component analysis, where the top 50 principal components were selected for cellular clustering based on marker gene identification using the elbow plot criterion. Subsequently, the expression profiles of DAP3 and UBE2S across varied cell populations were systematically evaluated.
2.9 SingleCell ATAC-Seq Analysis
Single-cell ATAC-seq profiling was conducted on primary specimens derived from three independent individuals: one newly diagnosed (primary) multiple myeloma patient, one relapsed/refractory multiple myeloma patient, and one healthy donor as a control. The final high-quality dataset comprised 9016 cells, consisting of 827 normal plasma cells from the healthy control, 5175 primary malignant cells from the newly diagnosed patient, and 3014 recurrent malignant cells from the relapsed patient. Reads were aligned to the hg38 genome, and peaks were identified via MACS2. Quality control retained cells with a transcription start site (TSS) enrichment score 6 and 1500 fragments. Dimensionality reduction and UMAP visualization were performed using iterative latent semantic indexing. Clusters were defined using Seurat, while gene activity scores and chromatin co-accessibility were analyzed to assess transcriptional potential and regulatory interactions. Differential accessibility was determined using ArchR (Wilcoxon test, FDR 0.05).
2.10 IHC and mIF Analysis of UBE2S and DAP3
This investigation encompassed three newly diagnosed, treatment-naïve MM patients alongside three untreated MGUS patients serving as controls. Tissue specimens underwent processing for both immunohistochemistry (IHC) and multiplex immunofluorescence (mIF) staining to evaluate UBE2S and DAP3 expression patterns. For IHC analysis, tissue specimens were deparaffinized and rehydrated, subsequently subjected to EDTA-mediated antigen retrieval, followed by serum-based blocking procedures. Primary antibodies targeting UBE2S (1:100, ab197945, Abcam, Cambridge, UK) and DAP3 (1:100, ab302889, Abcam, Cambridge, UK) were applied, with 3,3′-Diaminobenzidine (DAB) staining employed for visualization. Hematoxylin counterstaining was implemented, and images were acquired using light microscopy (BX53, Olympus, Tokyo, Japan).
For mIF analysis, specimens underwent EDTA antigen retrieval treatment and 3% hydrogen peroxide exposure in darkness for 25 minutes, succeeded by serum blocking. Primary antibodies underwent overnight incubation at 4 °C, after which fluorescent secondary antibodies were incubated for 1 hour. Nuclear staining was accomplished using 4′,6-diamidino-2-phenylindole (DAPI), and the slides were subsequently mounted employing antifade mounting medium. mIF signals for UBE2S (red), DAP3 (green), and nuclei (blue) were visualized under fluorescence microscopy. Tissue specimens were subjected to IHC and mIF staining procedures utilizing commercially available reagents and antibodies, with comprehensive details provided in Supplementary Tables 1,2.
2.11 Statistical Analysis
Statistical analyses were conducted utilizing R language (version 4.3.3, Foundation for Statistical Computing, Vienna, Austria), Comparisons between two groups were performed using the Wilcoxon rank-sum test, while comparisons among three or more groups were conducted using the Kruskal-Wallis test. A
p 0.05 was designated as the threshold for statistical significance. The comprehensive flowchart delineating the entire bioinformatics analytical procedure is illustrated in Fig.
1. Semi-quantitative evaluation of immunofluorescence outcomes was executed through ImageJ software (National Institutes of Health, Bethesda, MD, USA). Data visualization was accomplished via GraphPad Prism (version 10; GraphPad Software, Boston, MA, USA), whereby bar graphs incorporating scatter plots were generated to depict the findings.
3. Result
3.1 Identification of Hub Genes From GSE6477 and GSE5900
Analysis of the GSE6477 dataset identified 961 DEGs between patients with MM and healthy controls (Fig.
2A). Conversely, the GSE5900 dataset revealed 355 DEGs between MGUS and normal controls (Fig.
2B). Heatmaps in Fig.
2C,D depict the 20 DEGs. Using WGCNA on the GSE6477 dataset, 12 gene modules significantly associated with MM were identified, each uniquely color-coded (Fig.
3A,B). The green, yellow, blue, and black modules demonstrated strong correlations with MM, with correlation coefficients and statistical significances as follows: green (
r = –0.61,
p = 4
10
-10), yellow (
r = –0.86,
p = 2
10
-27), blue (
r = 0.67,
p = 1
10
-12), and black (
r = –0.46,
p = 7
10
-6) (Fig.
3C,D). Analysis of the MGUS dataset GSE5900 also identified 12 modules, with the green module showing a significant positive correlation with MGUS (
r = 0.56,
p = 1
10
-6) (Fig.
3E,F). Focusing on the blue module linked to MM and the green module associated with MGUS, these modules were intersected with the previously identified DEGs, resulting in the identification of 12 hub genes (Fig.
3G).
3.2 ROC Curves for the 12 Intersection Genes
Diagnostic performance of the 12 intersection genes was assessed through ROC curve analysis utilizing the GSE6477 dataset. The area under the curve (AUC) serves as a metric for evaluating diagnostic accuracy. Proximity to 1.0 in AUC values signifies superior diagnostic precision, whereas values approximating 0.5 reflect diminished discriminatory power. An AUC of 0.5 represents a lack of meaningful diagnostic differentiation. The graphical representation in Fig.
4 reveals that each of the 12 genes exhibits AUC values exceeding 0.8, thereby indicating substantial diagnostic potential for MM.
3.3 Development and Validation of the Prognostic Model
The GSE136337 and TCGA datasets, encompassing OS and status information of MM patients, were initially retrieved for analytical purposes. Within GSE136337, two patients exhibiting a survival duration of 0 were eliminated from the analytical framework. GSE136337 was designated as the training cohort for risk model development, whereas TCGA-MM, in conjunction with GSE136337, was employed to identify pivotal prognostic genes and subsequently utilized for model validation. Within the training cohort, univariate Cox regression analysis was initially executed to ascertain genes demonstrating significant association with OS (
p 0.05). To mitigate potential overfitting, these survival-associated genes were subsequently processed through LASSO Cox regression incorporating 10-fold cross-validation, establishing the optimal penalty parameter (
) and ultimately isolating two pivotal prognostic genes, DAP3 and UBE2S (Fig.
5A–D). Based on their expression profiles, an MGUSscore was computed as follows: MGUSscore = (0.211
DAP3 expression) + (0.364
UBE2S expression). Patients were stratified into distinct risk cohorts per the median MGUSscore. K–M survival analysis demonstrated statistically significant disparities in OS, with individuals in the high-risk cohort demonstrating markedly diminished OS versus those within the low-risk cohort (
p 0.001; Fig.
5E).
3.4 Assessment of the Prognostic Risk Model
Significant differences were observed regarding ISS stage, LDH,
2-microglobulin, albumin, del1p, del13q, and del1p32 (Fig.
6A,B). The predictive performance of the risk model was evaluated using time-dependent ROC curve analysis based on the MGUSscore (Fig.
6C). Furthermore, multivariate Cox regression analysis demonstrated that the MGUSscore serves as an independent prognostic factor for patients with MM (hazard ratio: 2.251, 95% confidence interval: 1.570–3.228,
p 0.001) (Fig.
6D). A comprehensive nomogram integrating the risk score, patient demographics, ISS stage, LDH, and del1p was constructed to predict OS at 1-, 3-, and 5-year intervals (Fig.
6E). The concordance index (C-index) of the model was 0.673. Internal validation using 1000 bootstrap resamples confirmed the model’s reliability and discriminative ability. Calibration curves at 1, 3, and 5 years indicated good agreement between predicted and observed survival, validating the model’s stability and accuracy (Fig.
6F).
3.5 Verification of the MGUSscore Across Independent Cohorts
The prognostic effectiveness of the MGUSscore received subsequent validation through three independent cohorts. K–M survival analyses uniformly demonstrated that patients classified in the high-risk cohort exhibited notably diminished OS versus their low-risk counterparts across all datasets (Fig.
7A–C). Time-dependent ROC curve analyses further substantiated the model’s generalizability. In the TCGA-MM cohort, AUC values for 1-, 3-, and 5-year OS were ascertained as 0.660, 0.690, and 0.690, respectively (Fig.
7D). For the GSE4581 cohort, AUC values for 1-, 3-, and 5-year OS were procured as 0.661, 0.678, and 0.697, respectively (Fig.
7E). Likewise, in the GSE57317 cohort, AUC values for 1-, 2-, and 3-year OS all exceeded 0.7 (Fig.
7F).
3.6 Clinical Significance of Two Hub Genes
Differential expression validation of DAP3 and UBE2S genes through the GSE6477 dataset demonstrated statistically significant elevation in MM patient populations relative to MGUS subjects and normal individuals (Fig.
8A). Subsequent clinical investigations utilizing the GSE136337 dataset indicated that elevated genetic expression of these specific genes linked to diminished OS and progression-free survival (PFS) parameters (Fig.
8B,C). Moreover, DAP3 and UBE2S exhibited a link to the ISS of MM (Fig.
9A). Additionally, elevated UBE2S expression was linked to chromosomal abnormalities, including 1 del1p, del13q, and del1p32, whereas DAP3 was linked exclusively to del13q (Fig.
9B–D). These observations suggest that DAP3 and UBE2S may contribute to the malignant progression of MM.
3.7 Pathway Enrichment Analysis of Dysregulation of MGUSscore
Comparative expression analysis executed between the two risk cohorts identified a sum of 108 DEGs (
log
2FC
0.5,
p 0.05), consisting of 70 upregulated and 38 downregulated genes (Fig.
10A,B). KEGG pathway analysis demonstrated enrichment in pathways associated with cell cycle regulation, p53 signaling cascades, and drug resistance mechanisms, encompassing platinum drug resistance and antifolate resistance, thereby emphasizing the potential contributions of these genes to cancer therapeutic response and resistance mechanisms (Fig.
10C). GO enrichment analysis demonstrated that these DEGs participate in critical biological processes like cell division, microtubule cytoskeleton organization, and chromosome segregation, thereby highlighting their involvement in cellular proliferation and mitotic processes, particularly within cell cycle regulation (Fig.
10D). Interestingly, GSEA analysis, revealed substantial enrichment in gene sets related to ribosome function (e.g., ribosomal subunit) and mitochondrial components (e.g., mitochondrial protein-containing complex). These findings suggest a highly active cellular state in the high-risk cohort, highlighting a nexus between protein synthesis machinery and cellular bioenergetics (Fig.
10E).
3.8 Tumor Immune Microenvironment and Two Hub Genes and MGUSscore
Spearman correlation analysis demonstrated a notable link between the two core genes and infiltrating immune cells present in the tumor microenvironment (TME), with particular emphasis on plasma cells (Fig.
11A,B). Subsequent analysis demonstrated that elevated MGUSscore was markedly associated with enhanced infiltration of plasma cells, and resting memory CD4
+ T cells, while exhibiting decreased infiltration of CD8
+ T cells and M1 macrophages (Fig.
11C). The high-MGUS score cohort demonstrated reduced stromal, immune, and ESTIMATE scores, accompanied by an elevated tumor purity score, suggesting enhanced tumor purity and deteriorated prognosis (Fig.
11D).
3.9 Drug Sensitivity Analysis
Drug sensitivity predictions for individual patients were generated utilizing sensitivity data derived from the GDSC database. Within the high-risk cohort, reduced IC
50 values were observed for Venetoclax, Bortezomib, and Cyclophosphamide (Fig.
11E). The graphical representations demonstrate differential sensitivity patterns to Bortezomib across distinct risk stratifications, wherein diminished IC
50 values typically indicate enhanced drug susceptibility. Such comparative analysis enables the evaluation of therapeutic response variations between high- and low-risk patient populations, thereby establishing a foundation for personalized treatment protocols. Furthermore, the correlation between the two core genes and frequently employed anti-neoplastic agents was investigated. The findings revealed that expression levels of both genes exhibited negative correlations with the IC
50 values of Bortezomib (Fig.
11F), indicating that higher expression of these genes is associated with increased sensitivity to the drug.
3.10 Overview of Hub Gene Expression in Single Cells
Single-cell data procured from GSE176131 underwent analysis utilizing the Seurat package. Cell clustering was executed through the UMAP algorithm, with subsequent annotation into B cells, CD4
+ T cells, CD8
+ T cells, myeloid cells, immature red cells, and plasma cells to facilitate gene expression observation [
19] (Fig.
12A). Predominant expression of DAP3 and UBE2S was observed within plasma cells and CD8
+ T cells (Fig.
12B). Subsequently, plasma cells and CD8
+ T cells were extracted for additional analysis, with core genes being visualized through UMAP plots (Fig.
12C). Expression levels of DAP3 and UBE2S within plasma cells and CD8
+ T cells demonstrated elevation corresponding to MM grading progression (Fig.
12D–G). Trajectory analysis was conducted using Monocle3 [
21,
22,
23] within the identical low-dimensional space to represent potential developmental pathways of these cellular populations, thereby illustrating DAP3 and UBE2S expression patterns throughout cellular development (Fig.
12H). Validation through the Human Protein Atlas [
24] demonstrated substantial expression of DAP3 and UBE2S within plasma cells, thereby corroborating previously obtained findings (Fig.
12I,J).
3.11 Global Chromatin Remodeling and Epigenetic Activation
Single-cell ATAC-seq profiles comprising 827 normal, 5175 primary, and 3014 recurrent plasma cells were analyzed to delineate epigenetic landscapes. Dimensionality reduction utilizing UMAP revealed significant heterogeneity, with malignant populations clustering distinctly from normal controls, indicating profound chromatin remodeling (Fig.
13A,B). Assessment of chromatin-inferred gene activity demonstrated sustained DAP3 accessibility across malignant clusters (Fig.
13C), whereas UBE2S exhibited marked upregulation in myeloma cells relative to controls (Fig.
13D). Quantitatively, a distinct “Normal
Recurrent
Primary” accessibility trajectory was delineated, evidencing stage-dependent epigenetic modulation of UBE2S throughout disease progression. Genomic track analysis was subsequently performed to elucidate regulatory architectures. The DAP3 locus exhibited a consistent chromatin architecture, characterized by synchronized open chromatin signals at the DAP3 and adjacent YY1AP1 promoters (Fig.
13E). This conserved co-accessibility implies that these genes are governed by shared cis-regulatory elements or reside within a common active topologically associating domain (TAD). Conversely, the UBE2S locus displayed a coordinated regulatory unit involving a persistently open upstream element and the UBE2S promoter (Fig.
13F). While the upstream element remained stable, specific signal enrichment at the UBE2S promoter was observed in malignant samples, indicating an epigenetic shift favoring enhanced transcriptional potential.
3.12 Elevated UBE2S and DAP3 Expression in MM Versus MGUS
Both IHC and mIF analyses corroborated the observations that UBE2S and DAP3 expression levels were markedly elevated in MM tissues versus MGUS controls, demonstrating statistical significance. IHC analyses demonstrated intensified DAB staining patterns for UBE2S and DAP3 within MM specimens, whereas mIF exhibited more pronounced red (UBE2S) and green (DAP3) fluorescent signals in MM tissues (Fig.
14A,B and Fig.
15A,B). Subcellular localization analysis revealed that DAP3 was predominantly located in the cytoplasm, whereas UBE2S showed diffuse expression in both the nucleus and cytoplasm. Semi-quantitative assessment of mIF additionally validated that UBE2S and DAP3 expression levels were substantially upregulated in MM versus MGUS (
p 0.01, respectively) (Fig.
15C). These findings indicate that UBE2S and DAP3 may contribute essential functions in MM progression and merit additional investigation.
4. Discussion
MM represents an intractable malignant plasma cell neoplasm characterized by irreparable organ impairment. The disease typically progresses through three distinct stages: MGUS, SMM, and Symptomatic or active MM [
25]. In individuals aged over 50 years, MGUS prevalence surpasses 3%, exhibiting an annual progression rate to MM of 1%. SMM progresses to MM at roughly 10% per year within the initial five years following diagnosis [
26,
27]. Timely therapeutic intervention and prophylactic strategies are essential for decelerating MM progression. Nevertheless, dependable biomarkers for forecasting the transition from MGUS to MM continue to be elusive. Understanding the molecular foundation that underlies this progression remains essential for identifying high-risk MGUS patients and establishing novel therapeutic targets. Consistent with this need, Khalili
et al. [
28] recently employed a systems biology analysis to map disease evolution, highlighting the role of transcriptional dysregulation, particularly within the AP-1 family. Aligning with this perspective, our study utilizes similar bioinformatic modeling to further dissect this complex landscape, identifying DAP3 and UBE2S as distinct key drivers. These complementary findings suggest that alongside transcriptional changes, (indicated by UBE2S identification) and mitochondrial homeostasis likely represent critical for malignant transformation, validating the utility of network-based approaches in uncovering therapeutic targets.
Utilizing computational bioinformatics methodologies, an examination of MGUS (GSE5900) and MM (GSE6477) datasets facilitated the identification of DEGs shared across MGUS, MM, and control specimens. By implementing WGCNA, ddisease-related modules were identified and then cross-referenced with the detected DEGs, leading to the final selection of 12 potential candidate genes. After univariate Cox regression analysis and LASSO methodology, two pivotal genes, DAP3 and UBE2S, were identified and utilized to construct a prognostic scoring system (MGUSscore). Multivariate analysis validated this scoring system as an independent prognostic indicator, whereby patients classified as high-risk demonstrated markedly diminished OS and PFS. The predictive precision was further corroborated through ROC curve analysis and nomogram construction. However, with AUC values fluctuating between 0.66 and 0.70, the model exhibits moderate performance. This suggests that the MGUSscore should not be used in isolation. Instead, it offers the most value when integrated with standard clinical risk factors to guide patient management.
Clinical correlation analyses demonstrated substantially elevated expression levels of these two pivotal genes in MM versus MGUS and healthy controls. The identified genes exhibit correlations not only with OS and PFS but also reveal associations with ISS staging, where advanced phases demonstrate heightened gene expression patterns. Notably, UBE2S displayed enhanced expression in populations with 1p, 13q, and 1p32 deletions versus normal subgroups maintaining intact chromosomal regions, while DAP3 showed particular linkages with 13q deletions. Investigative findings indicate that 1p deletion frequency in MM approaches roughly 30%, in marked contrast to merely 6% in MGUS. This genetic alteration commonly occurs alongside 1q21 region amplification, detected in 40% of MM and 25% of MGUS cases, and links to elevated risk of MGUS progression to MM and unfavorable patient outcomes [
29]. Furthermore, del13 demonstrates connections to MGUS transformation into MM [
30]. These observations suggest potential mechanistic roles of these two critical genes in malignant cellular transition processes.
DAP3, alternatively designated as S29mt, MRPS29, and bMRP-10, represents an apoptosis-associated protein [
31]. Contemporary investigations demonstrate that DAP3 exhibits significant correlation with tumor advancement [
32] and therapeutic resistance [
33]. Functioning as a mitochondrial ribosomal constituent predominantly localized within the mitochondrial matrix, DAP3 serves essential regulatory functions in mitochondrial homeostasis. Suppression of DAP3 expression enhances cellular susceptibility to mitochondria-mediated intrinsic apoptotic pathways [
34]. Furthermore, DAP3 functions as a splicing-regulatory RNA-binding protein in malignancies, with aberrant splicing events modulated by DAP3 manifesting across diverse cancer types [
35]. UBE2S, designated as Ubiquitin-conjugating enzyme E2S, assumes pivotal significance in the pathogenesis and advancement of diverse malignant neoplasms through enhancement of cellular invasion, migration, and proliferation via modulation of DNA repair mechanisms, DNA damage responses, and cell cycle regulation [
36,
37,
38]. This investigation corroborates these observations.
Beyond biological functions, our single-cell ATAC-seq analysis uncovers distinct epigenetic mechanisms. DAP3 maintains a consistently accessible chromatin state, suggesting it serves as a stable survival factor. Conversely, UBE2S exhibits a dynamic ‘gain-of-function’ pattern, peaking in newly diagnosed MM. Notably, UBE2S accessibility in relapsed samples was lower than in primary MM. This aligns with the therapeutic clonal selection model [
39]. Treatments often eradicate dominant, highly proliferative clones—those likely possessing the highest UBE2S activity. Consequently, recurrent tumors may consist of resistant subclones with moderated expression, allowing them to evade therapy via ‘expression escape’. Crucially, it must be noted that scATAC-seq infers gene activity based on chromatin openness, representing regulatory potential rather than direct protein abundance. Therefore, actual protein expression in relapsed disease may diverge from these genomic predictions due to complex post-transcriptional regulations.
The pathogenesis of MM is profoundly influenced by the dynamic interplay between the evolving immune microenvironment and the bone marrow niche, which collectively dictate disease initiation and drug responses. Recent studies underscore that this ecosystem is not a static background but actively co-evolves with clonal plasma cells starting from early disease stages. Preliminary immune microenvironmental modifications, encompassing both immune stimulation and depletion, are detectable during the MGUS phase, offering a systems biology framework for comprehending TME restructuring [
40]. In this investigation, the elevated MGUSscore cohort demonstrated enhanced infiltration of plasma cells, quiescent and resting memory CD4
+ T cells while exhibiting decreased infiltration of CD8
+ T cells and M1 macrophages. Plasma cells, functioning as neoplastic cells in MM, displayed augmented infiltration correlated with increased tumor burden and diminished prognosis in high-risk patients [
41]. The observed paucity of cytotoxic CD8
+ T cells in the high-risk group indicates an immunosuppressive, immune-excluded tumor microenvironment, a hallmark of advanced disease and poor outcomes. Follicular helper T cells and their subset Tfh17 have been demonstrated to be modified in MM, with an increased Tfh17/Tfh ratio correlating with disease advancement [
42]. Regarding M1 macrophages, which conventionally possess anti-tumor properties, their decreased infiltration in the high-risk group suggests a potential redirection toward M2 polarization or a depletion of anti-tumor myeloid populations driven by tumor-derived factors [
43]. Dormant mast cells and eosinophils may regulate the TME through cytokine secretion or local inflammation modulation, although the precise mechanisms require further elucidation [
44].
In terms of immunoregulatory mechanisms, DAP3 has been identified as a downstream effector of the Ikaros protein [
45]. As Ikaros function is often essential for MM cell survival, its sustained activity likely drives the progressive elevation of DAP3 expression observed in our study. This aberrantly high level of DAP3 in malignant plasma cells may confer a survival advantage against immune-mediated cytotoxicity. Such persistent tumor survival, despite immune attack, could indirectly sustain chronic antigen stimulation, thereby eventually leading to the functional impairment and exhaustion of antitumor effectors described above. UBE2S may facilitate TME remodeling through modulation of immune-tumor cell interactions, functioning as a “double-edged sword” in neoplastic progression [
46]. This investigation demonstrates a positive correlation between UBE2S and DAP3 expression with plasma cell populations. Additionally, single-cell analysis substantiated that both genes demonstrate enhanced expression levels, particularly within B cell and plasma cell compartments, in MM patients, with expression intensities escalating concomitantly with disease progression. Intriguingly, our single-cell analysis revealed a more complex dynamic regarding CD8
+ T cells. While the total number of CD8
+ T cells was reduced in high-risk patients, UBE2S and DAP3 expression was significantly upregulated within the surviving CD8
+ T cell population. This implies that these genes may play an intrinsic role in T cell biology. We hypothesize that aberrant upregulation of UBE2S and DAP3 within CD8
+ T cells could contribute to their dysfunction, progressive exhaustion, or susceptibility to activation-induced cell death, ultimately leading to the reduced overall numbers observed at the tissue level. Collectively, these findings suggest that UBE2S and DAP3 actively modulate cellular communication within this ecosystem, thereby bridging the gap between clonal plasma cell evolution and the establishment of therapeutic resistance.
Integrating these observations, we propose that UBE2S and DAP3 significantly modulate cellular communication within this ecosystem, potentially bridging the gap between clonal plasma cell evolution and the establishment of therapeutic resistance. Despite the elevation of certain antitumor immune cell populations within the high-risk cohort, the overall TME demonstrated predominantly immunosuppressive features, characterized by diminished stromal, immune, and ESTIMATE scores coupled with elevated tumor purity, which potentially compromises efficacious antitumor responses and may account for the concurrent presence of elevated MGUSscore alongside adverse prognosis. This finding aligns with documented evidence in MM regarding T-cell exhaustion, Treg and MDSC activation, and enhanced expression of inhibitory checkpoint molecules, encompassing PD-1/PD-L1 [
47]. Therefore, the immunological profile observed within the high MGUSscore cohort may partially reflect the intricate regulatory influence exerted by DAP3 and UBE2S upon the tumor immune microenvironment, along with their prospective roles in MM disease progression.
Drug sensitivity analysis suggests that patients with elevated MGUSscore may demonstrate enhanced responsiveness to therapeutic agents, including Bortezomib, Venetoclax, and Cyclophosphamide. These predictions derive from computational algorithms utilizing the GDSC database, which establishes correlations between gene expression profiles and IC50 values, though experimental validation remains pending. A plausible mechanism involves the heightened dependency of high-risk malignant cells on particular molecular pathways, such as proteasome function, potentially conferring increased susceptibility to proteasome inhibitors through in silico modeling. Nevertheless, actual clinical efficacy may be modulated by supplementary variables, encompassing immunosuppressive TMEs, T-cell exhaustion phenomena, and intratumoral heterogeneity. These observations illuminate the promise of employing gene expression-based predictive models for personalized therapeutic approaches, while emphasizing the necessity for rigorous experimental and clinical validation.
Several limitations warrant acknowledgment in this investigation. Initially, the computationally predicted drug sensitivities, encompassing enhanced responsiveness to Bortezomib within the high-risk cohort, represent purely algorithmic predictions necessitating experimental or clinical validation through MM-specific patient populations. Subsequently, while this investigation emphasizes the potential contributions of UBE2S and DAP3 in modulating the TME and affecting disease progression, the fundamental molecular mechanisms remain inadequately elucidated. Future investigations should examine how these pivotal genes regulate plasma cell behavior and immune interactions. Ideally, these studies should utilize flow cytometry or spatial transcriptomics to functionally characterize immune subsets and integrate established immune signatures to validate the exhaustion phenotypes described herein. Extension of these analyses to early MGUS stages may unveil actionable vulnerabilities for early therapeutic intervention and establish a more robust biological foundation for the MGUSscore as both a prognostic and therapeutic instrument.
5. Conclusions
This bioinformatics study elucidates the pivotal contributions of two central genes (DAP3 and UBE2S) in MGUS to MM pathogenesis and disease progression. These identified genes potentially facilitate malignant transformation through diverse immune cellular pathways, exhibiting substantial correlations with MM development and prognosis. Furthermore, they demonstrate enhanced responsiveness to bortezomib therapy. The constructed prognostic scoring system (MGUSscore), derived from MGUS-associated core genes, effectively stratifies MM patients into high- and low-risk cohorts based on survival outcomes, highlighting its robust predictive capability. This investigation offers essential insights into MM pathogenesis, identifies potential therapeutic targets, and promotes precision medicine development.
Availability of Data and Materials
In this study, our data sources include the TCGA (The Cancer Genome Atlas,
https://www.cancer.gov/tcga), GEO (Gene Expression Omnibus,
https://www.ncbi.nlm.nih.gov/geo/), and the Human Protein Atlas (
https://www.proteinatlas.org/). The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
Changzhou Municipal Health Commission Science and Technology Project (Advanced Technology)(QY202401)
Changzhou Science and Technology Plan Project (Social Development)(CE20235060)
Changzhou Science and Technology Project (Applied Based Research)(CJ20243014)
Changzhou Science and Technology Project (Applied Based Research)(CJ20243015)
Changzhou Science and Technology Project (Applied Based Research)(CJ20230047)
Changzhou Clinical Medical Center and the Outstanding Talent of Changzhou “The 14th Five-Year Plan” High-Level Health Talents Training Project