1 Precision oncology and omics
Precision oncology utilizes molecular tumor profiling to individualize and optimize cancer treatment. The application of high-throughput technologies to investigate the omic landscape of tumors has significantly enhanced the understanding of cancer biology and facilitated the development of specialized treatments for specific subtypes of cancer. Early attempts in precision oncology primarily focused on cancer genomics, as tumor cells exhibit genomic instability characterized by widespread gene mutation or chromosomal abnormalities [
1,
2]. For instance, the utilization of imatinib (Gleevec) has proven effective in treating chronic myeloid leukemia patients with
BCR-ABL translocation [
3]. The introduction of trastuzumab (Herceptin) has been beneficial for patients with
HER2-amplified breast cancer (BC) [
4]. And the development of epidermal growth factor receptor (EGFR) and anaplastic lymphoma kinase (ALK) inhibitors has shown promise in non-small cell lung cancer (NSCLC) cases with
EGFR mutations or
ALK rearrangements [
5].
These early successes utilizing genomic markers highlighted the potential of targeted drugs and indicated future therapeutic opportunities. Extensive large-scale sequencing efforts have produced comprehensive catalogs of key genomic alterations in various cancer types, enabling the identification of potentially actionable abnormalities. One of the most notable endeavors in this regard is The Cancer Genome Atlas (TCGA) project, which undertook a systematic characterization of approximately 11 000 cases from 33 different tumor types [
6]. This work generated extensive data on somatic mutations, copy number variations (CNVs), DNA methylation, and transcriptomics-based gene expression profiles, contributing to the understanding of the molecular heterogeneity among tumor subtypes.
Despite the significant impact of genomic analyses on cancer research, molecularly targeted therapeutic strategies are not readily available for most mutations. There is still a gap in our understanding of how genomic variants specifically affect human cancers. Many tumors harbor mutations with unknown significance, making it challenging to identify actionable oncogenic drivers [
7]. Chromosomal CNVs often affect multiple genes, and the combined functional consequences of these alterations can be complex. CNV-protein correlations are generally weak across cancers, consistent with post-translational regulation. Relying solely on mutational profiling for therapeutic strategies has its limitations. While transcriptomic profiling provides additional information about mutational status, it does not reliably predict protein levels or their functional state, since the correlation between mRNA and protein levels in cancers is only around 0.45 [
8–
11]. Furthermore, the PTMs cannot be predictable from genomic or transcriptomic data. Consequently, in clinical settings, the response rates to targeted therapies based on genomic profiling have frequently been lower than anticipated, and drug resistance commonly emerges in many patients [
12].
Therefore, characterization the cancer proteome (measurement of proteins and PTMs) is critically important for personalized oncology as genomic, epigenomic, and transcriptomic alterations ultimately impact the activity of proteins expressed in tumors. MS-based proteomics has emerged as a powerful technology for identifying and quantifying the global proteome and PTM sites in complex biological samples, which is increasingly employed in cancer research. More recently, proteogenomics was developed, aiming to integrate protein and PTMs data with genomic and transcriptomic information. This integration enables a more comprehensive understanding of cancer biology and facilitates insights into therapeutic outcomes. In this review, we summarize the technological advancements in MS-based proteomics as applied to proteogenomics, and describe the key findings and potential translational applications of these comprehensive cancer proteogenomic analyses. We also discussed the limitations of current proteogenomics and proposed a research paradigm driven by proteogenomics.
2 Proteogenomics workflow
With the rapid advancement of proteogenomics research, standardized and well-established workflows have increasingly taken shape. The first step in this workflow is sample aliquoting, which is preceded by uniform tissue homogenization to minimize intra-sample heterogeneity and improve reproducibility [
13]. Because formalin-fixed paraffin-embedded (FFPE) samples often suffer from DNA and RNA degradation, and elevated false-positive rates in calling of single-nucleotide variations (SNVs) and structural variations (SVs), fresh-frozen (FrFr) tissue is generally preferred for genomic analyses to ensure higher sequencing quality and more reliable data [
14–
17]. For frozen tissue samples, homogenization is typically achieved by cryogenic pulverization, ensuring that the material is evenly processed before being divided into multiple portions for downstream analyses [
13]. Typically, one portion is allocated for genomic analyses, including whole-genome sequencing (WGS), whole-exome sequencing (WES), and RNA sequencing (RNA-seq), while another portion is reserved for MS-based proteomic analyses, such as global proteome and post-translationally modified proteomes [
18–
20].
2.1 Genomics analysis
Genomic analyses typically include WGS, WES, and RNA-seq. WGS enables high-throughput sequencing of the entire genome, providing comprehensive genomic information that captures both coding and noncoding variants, including SNVs, insertions and deletions (InDels), and SVs like copy-number variations (CNVs), inversions and translocations. In contrast, WES targets only the exonic regions, which account for approximately 1%–2% of the human genome [
21], thereby markedly reducing the sequencing scope and data volume. This allows WES to achieve higher sequencing depth with improved efficiency and reduced cost [
22]. Compared to WES, WGS provides more uniform coverage and higher sensitivity for detecting SNVs and InDels, and enables more comprehensive detection of SVs including CNVs [
23,
24]. Unlike WGS and WES, RNA-seq focuses on quantifying transcript levels, particularly mRNA expression.
Genomic analyses generally follow well-established procedures, including DNA or RNA extraction, library preparation, and sequencing [
17,
25,
26]. In proteogenomic studies, DNA and RNA extraction is typically carried out using commercial kits, which offer low cost, methodological robustness, and suitability for large-scale workflows. Commonly used nucleic acid extraction methods include spin-column-based purification, magnetic-bead-based extraction, and organic-solvent-based methods, such as phenol-chloroform [
27,
28]. Among these, spin-column and magnetic-bead-based extraction methods are widely used in commercial kits due to its simplicity and reproducibility, and commonly applied in proteogenomics studies [
20,
29–
33].
Standard library preparation workflows typically involve nucleic acid fragmentation, adapter ligation, and optional PCR amplification [
34]. For WES, an additional exome-enrichment step is required. Numerous commercial kits are available for this purpose, with in-solution hybridization capture being the predominant strategy [
35]. In this approach, biotinylated probes hybridize to exonic regions, and the probe–target complexes are subsequently isolated using streptavidin-coated beads. For RNA-seq, mRNA enrichment or rRNA depletion is generally performed prior to fragmentation, with a wide range of commercial kits available. Poly (A) selection is the most common strategy for mRNA enrichment [
36–
38]. rRNA depletion is accomplished by hybridizing rRNA-specific probes to rRNA molecules, followed by removal through RNase H-mediated digestion or bead-based pulldown [
39,
40]. Following fragmentation, cDNA synthesis is performed prior to adapter ligation. cDNA is most commonly generated using random primers in combination with a reverse transcriptase. In addition, template-switching approaches such as SMART technology may be employed, providing distinct advantages for generating full-length cDNA and for applications requiring low-input material, including single-cell RNA-seq [
41,
42]. Next-generation sequencing (NGS) technologies have been widely adopted due to their high throughput and favorable cost efficiency [
43,
44]. Different sequencing platforms require platform-specific library construction strategies, with commonly used systems including second-generation short-read platforms such as Illumina, as well as third-generation long-read platforms such as Oxford Nanopore Technologies and Pacific Biosciences [
43,
45,
46]. Overall, the workflow of genomics analysis is well-established and can be implemented using a variety of robust experimental and sequencing platforms.
2.2 MS-based proteomics
The proteomics workflow generally involves several key steps: sample preparation (such as protein extraction, digestion, desalting, and PTMs enrichment), followed by liquid chromatography (LC) and MS analysis (Fig. 1A). In large-scale cancer proteogenomic studies, FrFr and FFPE tissues are the primary sample types (Table 1). FrFr tumor tissues are considered the ideal choice and are widely used due to their superior protein preservation, accurate representation of the proteome, sufficient protein content, and minimal interference from chemical fixatives, particularly when analyzing PTMs [
47–
49]. However, one of the challenges with FrFr samples is the limited availability, as tumor collections are often constrained by the higher costs of sample collection. Additionally, these collections may lack comprehensive clinical data, such as treatment histories or long-term disease outcomes, due to shorter follow-up periods. Strict adherence to the standard operating procedures (SOPs) for sample collection and preservation is essential, as factors such as ischemic time, tissue weight, necrosis, and tumor purity can all influence the quality of proteomic data. Furthermore, FrFr samples are typically collected from surgery specimens, which introduces potential bias in tumor stage representation, as patients with late-stage cancers may not be eligible for surgery. Needle core biopsies, which use microscaled methods [
50], offer a possible alternative for cancer proteogenomic research. However, these biopsies often provide insufficient tissue and lack corresponding normal tissue controls.
FFPE tissues are widely available clinical samples commonly used for histological, immunohistochemical, and molecular diagnostics. They serve as a valuable resource for retrospective clinical studies due to their long-term stability and the wealth of clinical outcome data often associated with them. However, these samples are not directly suitable for proteomic analysis due to formalin-induced protein crosslinking, which hinders efficient protein extraction and digestion into peptides. Nevertheless, proteomic analysis of FFPE tissues can be achieved with specialized sample preparation methods [
51,
52], such as reversing the chemical crosslinking through harsh treatments and removing reagents incompatible with MS. Additionally, techniques like laser-capture microdissection (LCM) allow the enrichment of specific tissue regions from FFPE samples for spatially resolved proteomics analysis [
53]. FFPE tissues have been successfully applied in proteogenomic studies across various cancer types, including pan-melanoma [
29], colorectal cancer (CRC) [
54], early-onset endometrioid endometrial carcinoma (EEEC) [
55], early duodenal cancer (DC) [
56], early esophageal cancer [
57], and urothelial carcinoma (UC) of the bladder [
30]. However, limitations remain, particularly in analyzing PTMs, due to the effects of the embedding process and the limited tissue available for analysis [
58]. Optimal cutting temperature (OCT)-embedded tissue is another type for tumor processing, mainly utilized in histopathologic analysis and clinical diagnosis [
59,
60]. Unlike FFPE, OCT-embedded samples are not affected by formalin. Nevertheless, due to the challenge that OCT polymer interferes with peptide separation in LC system and the ionization process, the OCT compound need to be removed first, which probably cause the incompleteness of proteome [
61,
62]. Due to these disadvantages, only a few proteogenomics studies used OCT-embedded tissues for data collection [
8,
9,
11,
63–
71] (Table 1).
Technical improvements in LC and MS systems have significantly enhanced large-scale proteomic characterization of tumors, enabling comprehensive quantitative analysis of proteins and PTMs [
72,
73]. Label-free quantification (LFQ) has emerged as a popular approach as it directly measures peptide mixtures without the need for labeling, making it more convenient and cost-effective [
74]. Traditional LFQ proteomics typically rely on data-dependent acquisition (DDA) mode, in which peptides are selected and fragmented based on their intensity [75] (Fig. 1B). This method has been applied to proteogenomic studies of various cancers, such as CRC [
54], papillary thyroid cancer (PTC) [
76], CRC [
77], clear cell renal cell carcinoma (ccRCC) [
78], meningiomas [
79], and lung adenocarcinoma (LUAD) [
80] (Table 1). However, this approach may miss low-intensity peptides, potentially leading to missing values, especially when in large cohort studies.
Isobaric labeling strategies, such as tandem mass tags (TMT) and isobaric tags for relative and absolute quantification (iTRAQ), utilize multiplexed isobaric chemical tags, enabling the analysis of multiple samples simultaneously, up to 18 samples [
81,
82] (Fig. 1B). These strategies offer higher throughput, improved quantitative accuracy, broader coverage, and greater reproducibility. As a result, they have become the preferred methods for large-scale proteomic studies, particularly TMT, in deep proteomic analysis of clinical tumor samples. Since iTRAQ was first applied in BC proteomics [
64] and TMT in HBV-related hepatocellular carcinoma (HCC) proteomics [
83], initiatives like Clinical Proteomic Tumor Analysis Consortium (CPTAC) and other cancer proteogenomic research groups have leveraged these techniques to characterize both proteomes and PTM-proteomes across a range of cancers, including endometrial carcinoma (EC) [
19], colon cancer [
66], lung cancer [
8,
10,
84–
86], head and neck squamous cell carcinoma (HNSCC) [
18], pancreatic ductal adenocarcinoma (PDAC) [
33], and glioblastoma (GBM) [
87] (Table 1).
Recent advancements in data-independent acquisition (DIA)-based LFQ proteomics workflows have made it a promising approach for multi-omics and proteogenomic studies [
88,
89] (Table 1). In DIA mode, all peptides within a cycling mass-to-charge (m/z) window across the entire m/z range are fragmented, providing additional information for low-intensity peptides and reducing the number of missing values in proteomic data across samples [90] (Fig. 1B). MS technologies have rapidly evolved, with innovations such as trapped ion mobility spectrometry (TIMS), parallel accumulation–serial fragmentation (PASEF), and the asymmetric track lossless (Astral) mass analyzer, further advancing the application of DIA workflows in comprehensive, single-shot analysis of proteins and PTMs (Fig. 1A) [
91,
92]. Nevertheless, the complex fragmentation spectra still pose a challenge for accurate and unambiguous peptide identification.
For a comprehensive analysis of PTMs, effective enrichment steps are crucial due to the typically sub-stoichiometric levels of PTMs. The presence of unmodified peptides can interfere with the detection of MS signals. In cancer proteogenomic investigations, prevalent PTMs comprise phosphorylation, acetylation, ubiquitylation, and glycosylation (Fig. 1A, Table 1). Phosphorylation stands out as one of the most pivotal and extensively studied PTMs, exerting critical regulatory roles in protein functionality and cellular signaling transduction. Consequently, phosphorylation is extensively characterized in cancer proteogenomic studies. Enrichment strategies for phosphopeptides are primarily categorized into affinity chromatography and immunoprecipitation [
93]. Most phosphoproteomic workflows rely on affinity-based approaches, such as immobilized metal affinity chromatography (IMAC) [
63,
94–
97] and metal oxide affinity chromatography (MOAC) [
80,
86,
98,
99]. These methods exploit the negatively charged nature of phosphate groups, which enables electrostatic interactions with positively charged metal ions (e.g., Fe
3+, Ti
4+) in IMAC [
100,
101], or coordination with metal oxides such as TiO
2 in MOAC [
102,
103]. Although these techniques are highly efficient, well established, and commercially accessible, they still suffer from issues such as nonspecific binding and differential affinity toward mono- versus multi-phosphorylated peptides [
104–
106]. Immunoprecipitation represents an alternative enrichment strategy, with anti-phosphotyrosine (pTyr) antibodies being the most widely used [
107,
108]. However, this approach typically provides limited coverage of phosphosite diversity. In addition to these mainstream strategies, emerging methods such as chemical derivatization and the development of novel hybrid materials have been explored for phosphopeptide enrichment [
109–
113]. Nevertheless, these techniques have yet to gain broad adoption in routine phosphoproteomic analyses.
Protein acetylation has emerged as another important focus within the field of PTM proteomics. Acetylation, which includes histone and non-histone protein acetylation, represents a significant mechanism for regulating tumor metabolism, oncogenic or tumor-suppressive signaling, and anti-tumor immune responses involved in tumorigenesis, metastasis, or drug resistance [
114]. Three major forms of protein acetylation have been described: lysine acetylation (KAc), N-terminal (N-ter) acetylation, and O-acetylation [
115], among which KAc is most prevalent in proteogenomics research [
8,
19,
31,
63,
67,
85,
87,
116]. In MS-based acetylproteomics, immunoaffinity enrichment remains the predominant strategy [
117,
118]. However, the efficiency and depth of immunoaffinity enrichment are inherently constrained by the substrate specificity of the antibodies, which limits the capture of diverse acetylated peptides. The use of broader-specificity affinity reagents may therefore enhance acetyl-peptide enrichment and expand acetylome coverage [
115,
119]. In addition, sample prefractionation prior to LC-MS/MS analysis, such as strong cation exchange chromatography (SCX) [
115], can reduce sample complexity and probably increase the number of detectable acetylation sites.
Ubiquitylation plays a pivotal role in protein degradation control and cell homeostasis maintenance. Aberrant ubiquitination can lead to conditions such as altered tumor metabolism, changes in the immunological tumor microenvironment, and modulation of cancer stem cell stemness during tumorigenesis [
120]. Ubiquitinated-peptide enrichment in ubiquitin proteomics is dominated by antibody-based strategies, including the widely used K-ε-GG monoclonal antibody, as well as the more recently developed UbiSite antibody and anti-GGX antibodies. Among these, the K-ε-GG antibody remains the standard and most widely used approach [
121,
122]. Following trypsin digestion, ubiquitinated lysine residues retain a characteristic di-glycine remnant, which results in a mass shift of +114 Da and enables selectively recognition by the antibody [
123]. Although this method offers high sensitivity and is amenable to large-scale analyses, it also presents inherent limitations. Most notably, it cannot capture N-terminal ubiquitination and does not discriminate ubiquitin from ubiquitin-like modifiers (UBLs), including ISG15 and NEDD8, which produce identical di-glycine remnants after trypsin digestion. To overcome these limitations, the UbiSite antibody uses Lys-C digestion to preserve the C-terminal 13-residue ubiquitin signature peptide, enabling selective enrichment with a monoclonal antibody [
124]. This approach eliminates interference from UBLs and also captures non-canonical N-terminal ubiquitination events. Complementing this, anti-GGX antibodies specifically recognize N-terminal diglycine motifs [
125], addressing a major blind spot of K-ε-GG-based approaches. In addition to antibody-based enrichment, antibody-free chemical strategies such as COmbined FRActional DIagonal Chromatography (COFRADIC) have been developed. COFRADIC employs a workflow involving chemical blocking of primary amines, deubiquitinase treatment, and two-dimensional diagonal chromatography to selectively isolate ubiquitinated peptides [
126]. This strategy avoids antibody-related sequence bias and provides broad coverage of ubiquitination sites, but its complexity and labor-intensive workflow have limited widespread adoption relative to antibody-based techniques.
Glycosylation plays a crucial role in cancer development, with glycoproteins frequently present on cell surfaces or secreted from cells, influencing cell–cell adhesion, growth, ligand-receptor interactions, and tumor metastasis [
127–
128]. Protein glycosylation is primarily classified into N-linked and O-linked glycosylation [
129]. N-linked glycosylation occurs at the conserved N-X-S/T motif (where X is any amino acid except proline) and features a relatively conserved core structure (GlcNAc
2Man
3). Consequently, its biosynthetic pathways and structural diversity have been extensively characterized. In contrast, O-linked glycosylation typically attaches to Ser or Thr residues, lacks a defined consensus sequence, and exhibits considerable heterogeneity in its core structures. In glycoproteomics, commonly used glycopeptide enrichment strategies are largely based on chromatographic principles, particularly strong anion exchange (SAX) and hydrophilic interaction liquid chromatography (HILIC) [
130–
131]. Both approaches enable broad enrichment of N- and O-glycopeptides. HILIC relies on the high hydrophilicity of glycans to selectively enrich glycopeptides. SAX-based methods include mixed-mode strong anion exchange (MAX) and Retain AX (RAX), with MAX being more widely applied in proteogenomics studies [
31–
33,
67,
89]. MAX leverages mixed-mode strong anion-exchange interactions to preferentially capture glycopeptides with weak anionic properties, thereby enhancing their detectability by mass spectrometry. In addition, several proteogenomic studies employ affinity-based techniques such as solid-phase extraction of N-linked glycosite-containing peptides (SPEG) and lectin affinity chromatography to achieve site-specific enrichment of N-glycosylation [
69,
132].
Lysine lactylation (Kla) is a newly identified PTM of lysine residues, with lactyl-CoA serving as its putative substrate. Kla occurs on both histone and non-histone proteins and has been implicated in the regulation of epigenetic programs, signal transduction, and metabolic pathways, thereby influencing key biological processes in tumors, including metabolic reprogramming, cell differentiation, epithelial-mesenchymal transition (EMT), angiogenesis, and remodeling of the tumor microenvironment (TME) [
133–
134]. In recent years, advances in MS-based lactylation proteomics have greatly facilitated research in this field, but its integration into proteogenomic studies remains limited and warrants further development. The most widely used approach is affinity enrichment using lysine lactylation-specific antibodies. This strategy has been successfully applied to multiple cancer types, including HCC, gastrointestinal cancers, and pancreatic cancer, providing valuable insights into the roles of Kla in metabolic regulation and tumor immunity [
135–
137]. Nevertheless, the approach is still limited by antibody specificity and detection sensitivity, highlighting the need for the development of higher-performance antibodies and MS methods to achieve more comprehensive mapping of the lactylation proteome.
Furthermore, an integrated workflow for the simultaneous analysis of proteins and various PTMs through sequential enrichment strategies provides diverse omics data from the same samples and is commonly utilized in cancer proteogenomic studies [
8–
10,
18–
19,
31,
33,
63,
66–
68,
83,
85,
87,
89,
138–
141]. For instance, in TMT-based proteogenomic analysis, 5% of TMT-labeled peptides are allocated for the analysis of unmodified peptides, while 95% of peptides undergo PTM analysis. Following phosphopeptide enrichment via IMAC, the flow-through can be utilized for acetyl peptide enrichment using antibody-based methods [
8,
19,
31,
67,
85,
87] and for glycosylated peptide analysis employing MAX columns-based techniques [
31,
33,
67,
89]. These strategies not only enhance analysis throughput but also reduce the required sample material for analysis.
3 Application of proteogenomics in cancer research
Since CPTAC investigators conducted the first three proteogenomic studies on colon, breast, and ovarian cancer between 2014 and 2016 [
11,
64,
70], a growing number of proteogenomic landscape studies have been published across various cancer types. According to statistical data from literature searches, the past five years have witnessed a substantial rise in cancer proteogenomic research, now covering about 40 tumor types and involving over 10 000 cases (Table 1). Most of these studies focus on treatment-naïve primary tumor specimens, with only a few including samples from post-treatment, recurrent, or metastatic tumors. By integrating genomics, transcriptomics, global proteomics, and PTMs data, researchers have built extensive clinical cancer proteogenomic data sets. These studies have significantly advanced our understanding of cancer biology, refined tumor molecular subtyping, explored tumor immune microenvironment, identified biomarkers for patient stratification, prognosis, and therapeutic decision-making, determined potential drug targets, and improved the prediction of personalized treatment strategies (Fig. 2). The following are key application into cancer enabled by these published proteogenomic studies.
3.1 Insights into cancer biology
Proteogenomic analysis generates extensive multi-omics data, offering valuable insights into tumor biology. This integrative approach validates gene mutations at the protein level by detecting mutated proteins, facilitating the understanding of the transition from gene prediction to protein expression and tumor phenotype. CNVs, including amplifications and deletions, are prevalent in tumors, and proteogenomic data helps reveal both cis and trans effects of CNAs, providing a foundation for investigating their role in tumorigenesis and progression. Additionally, proteins undergo various PTMs linked to tumor molecular pathways, and integrating proteogenomic analyses may help identifying new functional PTM sites involved in cancer progression (Fig. 2A).
Proteogenomic studies across multiple cancers have revealed how genomic alterations reshape tumor signaling at the protein and phosphoprotein levels.
In a proteogenomic analysis of 95 ECs, Dou
et al. [
19] found that
CTNNB1 hotspot mutations were shown to enhance Wnt pathway activity through increased β-catenin-associated protein and phosphoprotein levels, while
APC mutations provided an alternative mechanism for β-catenin overexpression. This integrated analysis highlights the collaborative mechanisms of pathway activation that arise when the known effects of
CTNNB1 somatic mutations are coupled with
APC mutations.
In adult GBM, Wang
et al. [
87] found that
EGFR and
PDGFRA alterations converge on shared phosphorylation targets, PTPN11 and PLCG1, indicating a common RTK signaling hub and revealing interconnected pathways driving GBM.
In early DC, Li
et al. [
56] found that chromosome 8q gain increases LYN protein levels, activating MAPK signaling and promoting tumor growth, suggesting the therapeutic potential of saracatinib. In addition, DST mutations elevated DST protein and enhanced PDK1 and PRKDC activity, thereby augmenting mTOR signaling during the adenocarcinoma stage.
In a proteogenomic study of 138 EC patients, Dou
et al. [
67] showed that
PIK3R1 in-frame indels, particularly in
PTEN-mutated tumors, increase AKT1 phosphorylation, correlate with poorer outcomes, and may serve as biomarkers for AKT inhibitor response. They also found that
CTNNB1 hotspot mutations occur near β-catenin S45 regulatory sites, promoting cell proliferation and transporter activity while reducing immune scores, potentially limiting the efficacy of Wnt–FZD antagonists.
In colon cancer, Vasaikar
et al. [
31] found that, unlike in most cancers, RB1 is amplified and overexpressed. Proteogenomic analysis revealed that this increase is associated with elevated phospho-RB1, which promotes proliferation and reduces apoptosis, suggesting that targeting RB1 phosphorylation may be a potential therapeutic strategy.
In HNSCC [
18], proteogenomic analysis revealed mutual exclusivity between
FAT1 truncating mutations and 11q13.3 amplifications, both converging on altered actin dynamics in HPV-negative tumors. Phosphoproteomics also identified two modes of EGFR activation: a ligand-independent model driven by EGFR amplification, and a ligand-dependent model limited by EGFR ligand abundance. These findings suggest that EGFR ligand levels, rather than EGFR amplification, may better guide selection for anti-EGFR monoclonal antibody therapy.
In HCC [
83], proteogenomic analysis showed that
CTNNB1-mutated tumors undergo metabolic reprogramming driven by increased phosphorylation of glycolytic enzymes, particularly ALDOA-S36, which was elevated despite slightly lower ALDOA protein levels. Functional assays confirmed that ALDOA activity is essential for the growth of
CTNNB1-mutated HCC cells.
Using phosphoproteomic data from 125 Chinese prostate cancer (PCa) patients, Dong
et al. [
141] identified FOXA1-S331 phosphorylation as a key driver of prostate cancer progression. This phosphosite correlated more strongly with AR signaling than FOXA1 protein levels, and functional assays showed that disrupting S331 phosphorylation impaired the FOXA1–AR cistrome and reduced tumor cell growth.
In a proteogenomic study of 139 Chinese cervical cancer (CC) patients, Yu
et al. [
116] identified FOSL2-K222 acetylation as a promoter of tumor progression. This modification correlated with EP300 acetylation, and functional assays showed that EP300 drives cell proliferation partly through FOSL2-K222 acetylation, highlighting EP300 as a potential therapeutic target in CC.
3.2 Characterization of tumor molecular subtypes
Cancer is a molecularly heterogeneous disease, and molecular subtyping forms the foundation for achieving precision oncology. Subtyping of common cancers, such as breast, lung, and colorectal cancers, is already used to guide clinical practice, typically classifying tumors into multiple subtypes based on clinical, genomic, or transcriptomic features. However, single-layer data often fail to fully capture the complexity of the molecular events driving cancer. A proteomics-based or multi-omics approach enables further characterization and refinement of finer subtypes based on cancer biology and clinical outcomes, which not only enhances our understanding of tumor heterogeneity and complexity, but also provides more precise prognostic insights and clinical treatment guidance (Fig. 2B).
3.2.1 Proteomics-based subtyping
In HBV-related HCC [
83], Gao
et al. defined three proteomic subtypes from 159 tumors with distinct metabolic, proliferative, and microenvironmental features. Proteomic subtyping outperformed transcriptomic classification in prognostic accuracy, and remained strongly correlated with patient outcomes.
A proteogenomic analysis of 137 treatment-naïve primary melanomas [
29] classified tumors into ECM, angiogenesis, and cell proliferation subtypes, with the angiogenesis subtype showing higher metastatic potential. Validation in two independent cohorts confirmed consistent molecular features and clinical outcomes across these subtypes.
In a study [
77] of 135 primary and 123 metastatic CRCs, proteomic clustering identified three primary and three metastatic subtypes characterized by hypoxia, stemness, and immune signatures. Hypoxia-high tumors, enriched in metastases, showed epithelial-mesenchymal transition (EMT) and metabolic reprogramming; stemness-high tumors had oncogenic pathway activation and an alternative telomere lengthening (ALT) phenotype; immune-cold tumors exhibited antigen presentation suppression, especially in metastatic tumors. Despite minimal genomic differences, these subtypes highlight proteomic plasticity and key features of metastatic disease.
Three proteogenomic studies of cholangiocarcinoma (CCA) [
99,
140,
150] classified tumors into distinct proteomic subtypes. Dong
et al. [
140] identified four subtypes (inflammatory, mesenchymal, metabolic, and differentiated) in 262 Chinese intrahepatic cholangiocarcinoma (iCCA) tumors. Deng
et al. [
150] classified 114 iCCA and 103 extrahepatic CCA (eCCA) tumors into three subtypes (metabolism, proliferation, stromal). Cho
et al. [
99] clustered 101 Korean iCCA tumors into metabolism, poorly immunogenic, and stem-like subtypes. Across all three studies, proteomic subtypes correlated with clinical outcomes, with the metabolism subtype showing the best prognosis.
Proteogenomic studies in lung cancer have revealed distinct proteomic subtypes across LUAD and SCC. Xu
et al. [
80] profiled 103 Chinese LUAD patients, identifying three proteomic subtypes (SI-SIII) with differences in metabolism, proliferation, prognosis, and activated kinases (e.g., AKT1, PRKCE, and AURKB). Another study [
138] of non-smoking East Asian LUAD identified three proteomic subtypes, revealing molecular heterogeneity among early-stage tumors with the same TNM stage. Notably, stage IB tumors with EGFR-L858R clustered into a “late-like” (stage IB late-like) group, showing higher metastatic potential and shorter overall survival (OS) compared to Del19 tumors. In 108 squamous cell lung cancer (SCC) patients, Stewart
et al. [
155] identified three proteomic subtypes: inflamed (immune cell-rich, high PD-1), redox (oxidation-reduction and glutathione pathways, NFE2L2/KEAP1 alterations, 3q2 gains), and mixed (Wnt signaling, increased stromal infiltration), highlighting distinct molecular and immune features. A proteogenomic analysis of 141 NSCLC tumors [
84] identified six proteomic subtypes with distinct immune landscapes and checkpoint expression. To enable clinical use, support-vector machine (SVM) and k-Top Scoring Pairs (k-TSP) classifiers were developed for cohort- and single-sample subtyping, respectively. Validation in two independent cohorts (208 early-stage and 84 late-stage cases) successfully reproduced the six subtypes, demonstrating robust clinical applicability even with limited samples.
In a Chinese medullary thyroid cancer (MTC) cohort [
168], protein-level clustering defined three proteomic subtypes: basal (high basal markers, best prognosis, enriched neuroendocrine features), metabolic (activated oncogenic and cell cycle pathways, HRD signature), and mesenchymal (extracellular matrix (ECM) upregulation, worst prognosis, enriched tyrosine kinase inhibitors (TKIs) targets).
Petralia
et al. [
139] performed a proteogenomics characterization of 218 pediatric brain tumor samples across seven histological types, identifying eight proteomic subtypes with distinct survival outcomes. Proteomic and phosphoproteomic analyses revealed two craniopharyngioma (CP) subtypes, one resembling BRAFV600E mutant low-grade gliomas (LGG), suggesting potential MEK–ERK–AKT–targeted therapies, which were not evident in the RNA data.
Proteogenomic analysis has also been applied to liquid tumors. In a study of 252 acute myeloid leukemia (AML) patients, Jayavelu
et al. [
144] discovered five proteomic subtypes with distinct biological features. One subtype termed Mito-AML, detectable only by proteomics, showed high mitochondrial protein expression, enhanced complex I-dependent respiration, poor prognosis, and increased sensitivity to mitochondrial complex I-targeting therapies, such as the BCL2 inhibitor venetoclax.
3.2.2 Multi-omics subtyping
Beyond the ability of proteomics to reclassify cancer molecular subtypes, the strength of multi-omics approaches, widely applied in proteogenomic studies, lies in their integration of proteomics and PTM omics data with genomics data. This comprehensive integration enhances the precision of cancer subtype classification, providing deeper insights into tumor biology and improving the translation findings into clinical application. Nonnegative matrix factorization (NMF) algorithm [
171] is commonly used for unsupervised clustering of tumor samples, identifying characteristic proteogenomic features, such as proteins, phosphopeptides, mRNA transcripts, miRNAs, and somatic CNAs, which display distinct abundance patterns across clusters. Selected highlights include lung cancer, pancreatic cancer, and head and neck cancer.
Unsupervised multi-omics clustering analysis has been applied to lung cancer subtyping. Gillette
et al. [
8] analyzed the proteogenomic landscape of 110 treatment-naïve LUAD tumors, identifying four multi-omics subtypes (C1–C4) with distinct clinical and molecular features. C1 corresponded to the proximal-inflammatory mRNA subtype, characterized by elevated immune signaling and high non-synonymous mutation burden; C2 and C3 aligned with the proximal-proliferative mRNA subtype, featuring coagulation disruptions and prominent histone deacetylase activity, respectively; and C4 matched the terminal respiratory unit mRNA subtype, distinguished by surfactant metabolism, MAPK1/3 signaling, MECP2 regulation, and chromatin organization. This study highlights the value of multi-omics profiling in capturing tumor heterogeneity beyond single-layer analyses. Satpathy
et al. [
85] profiled 108 lung squamous cell carcinoma (LSCC) tumors using multi-omics clustering (CNA, RNA, protein, phosphoprotein, and acetylprotein), identifying five subtypes: basal-inclusive, EMT-enriched, classical, inflamed-secretory, and proliferation-primitive. This approach refined prior RNA-based classifications by separating basal tumors into basal-inclusive and EMT-enriched subtypes, with the latter characterized by strong EMT signatures driven by interactions between cancer-associated fibroblasts (CAFs) and tumor epithelial cells, as a potential target for TGF-β inhibition. Tumors with low cluster membership scores were defined as a mixed subtype, reflecting heterogeneity and poorer survival. Finally, a proteogenomic analysis of 107 small cell lung cancer (SCLC) tumors [
10] identified four multi-omics subtypes (nmf1–nmf4) with distinct CNA, transcription factor, and biological profiles. Notably, nmf3 exhibited elevated RTK signaling, suggesting sensitivity to RTK inhibitors, while nmf4 showed selective upregulation of MYC at the protein and phosphorylation levels, highlighting its potential as a subtype-specific marker.
Cao
et al. [
33] analyzed proteogenomic data from 105 PDAC tumors, identifying two multi-omics subtypes (C1 and C2) largely consistent with the Moffitt classical and basal-like RNA subtypes but with 22 tumors showing discordance. Multi-omics subtyping provided stronger prognostic separation than RNA-based classifications, capturing greater differences in protein and phosphosite abundance (69% vs. 39% of proteins; 78% vs. 38% of phosphosites), highlighting its potential to improve prognostic precision and biomarker discovery in PDAC. In another PDAC cohort of 196 Asian patients [
162], integration of RNA, protein, and phosphoprotein data defined six multi-omics subtypes (Sub1–Sub6). This multi-omics approach refined RNA-based classifications, with Sub4 showing the worst median survival, high proliferation, and enrichment of cell cycle-related pathways, highlighting the complementary value of protein and phosphorylation data for prognosis and tumor characterization.
Huang
et al. [
18] profiled 108 HPV-negative HNSCC tumors using multi-omics data (CNA, RNA, miRNA, protein, phosphopeptide) and identified three subtypes, CIN, basal, and immune, via NMF clustering. When evaluating the subtypes against proposed biomarkers for targeted therapies, the multi-omics approach demonstrated its potential in guiding treatment selection. The CIN subtype, with frequent CCND1/CDKN2A alterations and high CDK4/6 activity, suggested sensitivity to CDK4/6 inhibitors; the basal subtype, with elevated EGFR ligands and pathway activity, indicated potential response to EGFR mAbs; and the immune subtype, characterized by high immune checkpoint protein expression, pointed to benefit from checkpoint inhibitor therapies.
3.3 Analysis of the tumor immune landscape
Immunotherapy shows great potential for cancer treatment, but it currently benefits only a small proportion of patients. A deeper understanding of the cancer immune microenvironment can illuminate how to leverage a patient’s immune system for effective anti-cancer therapies. Conducting proteogenomic analyses of the tumor immune landscape enhances our comprehension of TME heterogeneity and immune infiltration patterns, which are all critical for immunotherapy response. Utilizing proteomics and phosphoproteomics provides valuable insights into the crucial functional molecules that drive immune cell surveillance and tumor immune evasion, insights that are often overlooked by genomic approaches alone (Fig. 2C).
Integrated proteogenomic analysis of adult GBM [
87] identified four immune subtypes with distinct immune cell compositions, validated by single-nucleus RNA sequencing (snRNA-seq). Notably, the immune subtype (im3), enriched in T cells and natural killer (NK) cells but depleted of immunosuppressive macrophages-microglia infiltration, was primarily observed in IDH-mutated tumors, and immune-related pathways such as ferroptosis, mast cells, and reactive oxygen species showed negative correlations with H2B acetylation.
A proteogenomic study of melanoma [
29] identified three tumor subtypes (S1–S3) with distinct immune signatures and clinical outcomes. The S1 subtype, enriched in CD4
+ and CD8
+ T cells and expressing high levels of CD8A, PDCD1, CD247, CD274, and CD3D, showed elevated ECM-receptor and MAPK pathway activity. Mechanistically, MAPK7 kinase activity enhanced NFκB2-driven cytokine expression, promoting CD8
+ T cell recruitment, and anti-PD-1-treated cohorts confirmed that MAPK7-NFκB signaling correlates with improved immunotherapy response.
Integrative proteogenomic analysis of 218 pediatric brain tumors revealed substantial diagnostic-specific heterogeneity in the tumor microenvironment [
139]. Overall, low immune infiltration correlated with more aggressive tumor types, whereas higher immune infiltration characterized LGG
BRAFWT-rich, LGG
BRAFFusion-rich, and Cranio/LGG
BRAFV600E tumors. Based on immune and stromal characteristics, researchers identified five subtypes, cold-medullo, cold-mixed, neuronal, epithelial, and hot. The hot subtype, comprising LGG, high-grade glioma (HGG), and ganglioglioma, is enriched in immune cells and immune-related pathways, whereas cold-medullo and cold-mixed subtypes show low immune infiltration but elevated WNT/β-catenin, apoptosis, and proteasome signaling. The neuronal subtype exhibits enhanced glutamate signaling and energy metabolism with reduced expression of genes related to CD4, CD8A, and macrophages, while the epithelial subtype, including CP tumors, shows EMT and immune checkpoint activation, suggesting potential sensitivity to checkpoint blockade.
Integrated proteogenomics provided a comprehensive characterization of the immune landscape in LUAD [
8]. Immune-hot tumors displayed strong signatures of B cells, CD4
+/CD8
+ T cells, dendritic cells, and macrophages, along with upregulated immune-related and signaling pathways, particularly evident at the proteomic level, highlighting potential therapeutic vulnerabilities such as anti-CTLA4 therapy and IDO1 inhibition. In contrast, immune-cold tumors exhibited upregulation of metabolic pathways, including glycolysis and PPAR signaling, suggesting metabolic barriers to immune infiltration. Notably,
STK11 mutant tumors showed markedly reduced immune activation despite high mutation burdens, resulting in an immune-cold phenotype, and demonstrated significant enrichment of neutrophil degranulation, indicating a potential role in immune modulation.
In colon cancer [
66], proteogenomic analyses of the immune microenvironment suggested an interplay between metabolic reprogramming and immune evasion. In microsatellite instability-high (MSI-H) tumors, reduced CD8
+ T cell infiltration correlated with increased glycolytic activity, accompanied by elevated protein levels of SLC2A3 and PKM2, a relationship not observed in other subtypes. These findings suggest that combining immune checkpoint blockade with glycolysis inhibition may offer an effective strategy for MSI-H tumors resistant to immunotherapy.
3.4 Discovery of biomarkers for patient selection, subtyping, and prognosis
Tumor biomarkers are objective, quantifiable biological indicators used in disease diagnosis, prognosis evaluation, and clinical treatment guidance. These biomarkers can be sourced from DNA, RNA, proteins, and metabolites. Studies have shown that integrating somatic CNAs, DNA methylation, mRNA, microRNA, and protein expression data for survival analysis of patients across the four TCGA cancer types reveals that, in LSCC, only protein expression data provided a prognostic model comparable to clinical variables [
172]. Similarly, Wang
et al. [
173] performed proteomic profiling on 44 CRC cell lines and found that protein expression data, compared to gene mutations, CNAs, and mRNA expression, was more effective at identifying known drug-target and drug-pathway relationships, and better predicted therapeutic responses in CRC tumors. In another study by Sinha
et al. [
71], protein abundance-based biomarkers outperformed other data types in prognostic accuracy for PCa, as measured by AUC scores. Additionally, biomarkers derived from methylation-protein paired data significantly outperformed those based solely on genomic data. These findings emphasize the potential of multidimensional data from tumor proteogenomics for biomarker discovery and improving precision in cancer care.
Key insights from published cancer proteogenomic studies have revealed potential biomarkers for patient diagnosis, prognosis, molecular subtyping, and treatment response (Fig. 2D). These biomarkers are primarily based on protein abundance, with some also linked to gene mutations and PTMs, such as phosphorylation levels, as outlined below.
3.4.1 Diagnosis
In a proteogenomic study of 48 non-clear cell renal cell carcinoma (non-ccRCC) cases [
32], along with previously reported data from 103 ccRCC samples [
9], Li
et al. conducted multi-omics analyses and identified several biomarkers. Specifically, GPNMB, ADGRF5, and MAPRE3 were found to distinguish benign renal oncocytomas (ROs) from chromophobe RCC (chRCC), while PIGR and SOSTDC1 differentiated papillary RCC (pRCC) from mucinous tubular and spindle cell carcinoma (MTSCC). These findings fulfill the clinical need for biomarkers that are specific to rare subtypes of renal cell carcinoma.
Comparative multi-omics analyses of 140 PDAC tumors [
33], alongside 67 NATs and 9 normal pancreatic ductal tissues, identified 12 secreted proteins (SFN, WFDC2, THBS1, THBS2, MDK, CTHRC1, LOXL2, COL12A1, CD55, MFAP2, LAMC2, and LECT2) that were significantly upregulated in PDAC tumors, particularly in early-stage cases. These findings suggest that these proteins could serve as potential early detection biomarkers in serum or pancreatic juice, highlighting their utility for improved diagnosis in PDAC.
To address the need for complementary serum biomarkers for patients in the diagnostic “gray zone” of PCa, Dong
et al. [
141] performed proteogenomic profiling of high-risk, treatment-naïve PCa specimens from 125 patients. Supervised proteomic analysis identified six candidate secreted proteins, among which GOLM1 emerged as the most promising. ELISA validation showed that GOLM1 levels were highest in metastatic PCa, correlated with disease progression, and were elevated in primary tumors relative to benign prostatic hyperplasia (BPH). Tissue and serum GOLM1 levels were significantly correlated in matched samples (
n = 52). In an independent cohort of 456 serum samples, GOLM1 consistently increased from healthy/BPH to primary and metastatic PCa and outperformed PSA in distinguishing PCa from BPH within the PSA 4–20 ng/mL gray zone. These findings highlight GOLM1 as a strong noninvasive serum biomarker candidate for PCa diagnosis.
The ALK fusion mutation, often referred to as the “diamond mutation” due to its low mutation rate and significant impact on targeted therapy, characterizes one molecular subtype of NSCLC, accounting for 4%–6% of LUAD [
174]. In a proteogenomic study of 110 LUAD tumors [
8], phosphoproteomics showed pronounced ALK Y1507 phosphorylation specifically in ALK-fusion-positive cases. Immunohistochemistry (IHC) using commercial ALK and phospho-Y1507 antibodies confirmed tumor-specific positive staining in all ALK-fusion samples, with no signal in ROS1/RET fusion tumors or paired NATs. These findings highlight ALK Y1507 phosphorylation as a promising diagnostic marker for ALK-fusion LUAD.
Zhang
et al. [
165] performed an integrated proteogenomic characterization across the major histological types of pituitary neuroendocrine tumors (PitNET) and indicated that
GNAS copy number gain can serve as a reliable diagnostic marker for hyperproliferation with the POU2F1 (PIT1) lineage.
3.4.2 Prognosis
In the study of SCLC [
10], supervised proteomic analysis identified HMGB3 and CASP10 as prognostic biomarkers: high HMGB3 expression correlated with poorer survival, whereas reduced CASP10 expression indicated a more favorable prognosis. These associations were validated by IHC in both the discovery cohort and an independent SCLC cohort, supporting their potential clinical utility. In LUAD [
80], investigators hypothesized that tumor proteins with prognostic relevance might also be detectable in circulation. Proteomic analysis of 103 LUAD tumors identified HSP90AB1 as overexpressed in tumors relative to NATs and associated with poor OS. Plasma ELISA in an independent cohort of 705 LUAD patients and 282 healthy controls confirmed elevated circulating HSP90AB1 in patients and its negative prognostic correlation, supporting its potential as a blood-based prognostic biomarker for LUAD.
In HBV-related HCC [
83], supervised proteomic analysis identified PYCR2 and ADH1A as prognostic biomarkers: PYCR2 was highly expressed, particularly in the proliferative, poorest-prognosis subtype, while ADH1A showed consistently reduced expression, with the lowest levels in the same subtype. Higher PYCR2 and lower ADH1A expression correlated with worse survival, and immunostaining on tissue microarrays from the discovery set and an independent cohort of 243 cases confirmed their clinical relevance. In iCCA [
140], HKDC1 and SLC16A3 emerged as robust prognostic markers in a 262-patient proteogenomic discovery study and were validated in an independent cohort of 222 patients. Functional assays demonstrated tumor-suppressive activity for HKDC1 and strong oncogenic activity for SLC16A3. Additionally, in a separate study of 217 CCAs [
150], TCN1 was identified as a potential prognostic biomarker that promotes cell growth by enhancing vitamin B12 metabolism. The prognostic significance of TCN1 was validated in both the TCGA cohort [
175] and an independent cohort from Dong
et al. [
140].
In a comprehensive proteogenomic study of pediatric brain cancer [
139], Petralia
et al. evaluated the prognostic value of IDH protein abundance in pediatric HGG, where IDH mutations are uncommon. To assess the relationship between IDH protein levels and OS, histone H3 status was adjusted, as point mutations in histone
H3K27M are known to correlate with poorer outcomes in HGG [
176]. The results revealed a positive association between IDH protein abundance and improved OS in the
H3 wild-type (
H3WT) group. Multivariate Cox analysis showed that a 50% reduction in combined IDH1/2 abundance conferred a 23.6-fold higher risk of death in
H3WT HGG. Validation in an independent TMT-based proteomic cohort of 41 pediatric HGGs confirmed that lower IDH1/2 levels predicted shorter survival. These findings establish wild-type IDH1/2 proteins as a prognostic biomarker in
H3WT HGG.
In MTC [
168], Shi
et al. identified two overexpressed tenascin family members, tenascin-C (TNC) and tenascin-X (TNXB), in the most unfavorable mesenchymal subtype, using both MS and IHC data. Elevated levels of TNC or TNXB were associated with more advanced disease stages and were significant predictors of markedly poorer structural recurrence-free survival (SRFS). These findings suggest that TNC and TNXB may serve as potential prognostic biomarkers for MTC.
Except for protein biomarkers, Xiang
et al. [
29] revealed that
PRKDC amplification was a potential prognostic molecule for melanomas, and further proteogenomic analysis combined with functional experiments illustrated that
PRKDC amplification might lead to tumor proliferation by activation of DNA repair and folate metabolism pathways.
3.4.3 Molecular subtyping
As detailed above, Dong
et al. [
140] identified four distinct proteomic subtypes in iCCA. Four subtype-specific markers were selected to represent these subtypes: MPO (neutrophil) for the inflammatory subtype (S1), POSTN (fibroblast) for the mesenchymal subtype (S2), ALDOB (metabolism) for the metabolic subtype (S3), and EPCAM (biliary) for the differentiated subtype (S4). Multiplex immunostaining of these four subtype-specific biomarkers in both the discovery cohort and an independent validation cohort demonstrated the ability to stratify patient survival effectively.
3.4.4 Therapeutic response
In EC, a proteogenomic study [
67] of 138 tumors across 10 omics platforms identified potential biomarkers to guide precision therapy.
PIK3R1 in-frame indels were linked to elevated AKT1 phosphorylation, suggesting sensitivity to AKT inhibition. High MYC activity emerged as a marker for metformin responsiveness, including in nondiabetic, non-obese patients.
CTNNB1 exon 3 hotspot mutations stabilized β-catenin, potentially limiting the efficacy of Wnt-FZD antagonists, detectable via protein-based assays such as IHC. 1q amplification increased glycoprotein levels, indicating potential for PARP-inhibitor response. Finally, antigen processing machinery (APM) status was proposed as a biomarker for immune checkpoint inhibitors (ICI) response, measurable by targeted selected reaction monitoring (SRM) proteomics to quantify APM activity for patient stratification.
In another EC study, Hu
et al. [
55] conducted proteogenomic analysis of 81 EEEC samples and explored predictive biomarkers for progestin response in fertility-sparing treatment. Integrating overexpressed proteins in progestin-insensitive patients with six public hormone therapy data sets and correlating RNA levels with progression-free survival (PFS), four proteins—EEF1E1, ILVBL, SRPK1, and NUDT5—were identified. IHC validation supported their potential as markers of progestin resistance in EEEC.
In CC proteogenomic study, Yu
et al. [
116] revealed kinase PRKCB as a tumor suppressor gene (TSG) and a potential biomarker for radioresponse. Phosphoproteomic analysis showed that PRKCB activity was specific to subgroup 3, and integration with proteomics and PFS data from radiotherapy patients highlighted its association with favorable radioresponse, validated by IHC. Multi-omics analysis further revealed positive correlations with immune pathways and negative associations with cell cycle and DNA replication, while functional studies suggested that PRKCB may enhance radiosensitivity by modulating cell cycle progression.
Proteogenomic dissection of
CDKN2A mutations in 108 LSCC cases [
155] revealed that loss of CDK4/6 pathway inhibitors is common, while RB1 expression and phosphorylation are heterogeneous, particularly in tumors with
CCND1 amplification, potentially explaining variable responses to CDK4/6 inhibitors. RB1 phosphorylation has therefore been proposed as a biomarker to guide CDK4/6 inhibitor therapy in LSCC. Similarly, combined proteogenomic analyses of 122 primary BCs [
63] and the Genomics of Drug Sensitivity in Cancer (GDSC) data suggest that RB1 phosphorylation may predict CDK4/6 inhibitor response in triple-negative breast cancer (TNBC) cases.
3.5 Identification of new potential drug targets
Most drug targets are proteins, and integrative proteogenomic analysis aids in identifying new potential drug targets (Fig. 2E). In early-stage HCC, Jiang
et al. [
177] identified a poor-prognosis subtype marked by cholesterol-metabolic reprogramming, with sterol O-acyltransferase SOAT1 strongly upregulated and correlated with poorer outcomes. Pharmacologic inhibition with avasimibe showed significant anti-tumor activity in patient-derived xenograft (PDX) models, highlighting SOAT1 as a potential therapeutic target.
In DC, Li
et al. [
56] reported progressive upregulation of alanyl-tRNA synthetase (AARS1), and proteogenomic analysis coupled with biological experiments demonstrated that AARS1-mediated lysine-alanylation of PARP1 suppresses apoptosis and promotes tumor growth. These findings position AARS1 as a candidate target in DC therapy.
In esophageal squamous cell carcinoma (ESCC) [
57], multi-omics profiling revealed aberrant glycolysis driven by elevated protein and phosphoprotein levels of PGK1. ERK2 was identified as the top kinase associated with the phosphorylation motif activating PGK1 at S203. Biological experiments demonstrated that PGK1 activation enhanced glycolysis and serine synthesis while blocking pyruvate dehydrogenase activity. Hyperphosphorylated PGK1 (S203) thus represents a potential druggable vulnerability.
In ccRCC [
78], supervised proteogenomic analysis identified NNMT as a drug target due to its overexpression and association with poor prognosis. NNMT elevates homocysteine (Hcy) and K-Hcy modifications, particularly on DNA-PKcs, promoting proliferation and resistance to radiation therapy. Inhibiting K-Hcy modification with N-acetyl-cysteine (NAC) attenuated radiation resistance, underscoring NNMT as a therapeutic target.
In PDAC [
33], integrated proteomic and phosphoproteomic analyses revealed broad upregulation of PAK1/PAK2, key downstream effectors of KRAS regulating cytoskeletal motility, proliferation, and survival. These kinases represent promising targets, potentially in combination with inhibitors targeting the canonical KRAS downstream pathways, including MAPK/ERK and PI3K/AKT/mTOR, for
KRAS mutant PDAC.
In multiple myeloma (MM) [
145], proteomic and single-cell analyses identified FCRL2 as a selectively expressed surface protein on malignant plasma cells. Its limited expression on healthy plasma, B cells, and other hematopoietic cells suggests FCRL2 as a promising target for MM immunotherapy.
In LUAD [
8], subtype-specific phosphoproteomic signatures highlighted PTPN11 phosphorylation (notably Y62, Y546, and Y584) as a target in
EGFR mutant and ALK-fusion tumors, while increased SOS1 S1161 phosphorylation suggested its therapeutic potential in
KRAS mutant LUAD.
3.6 Prediction of personalized therapy strategies
Proteogenomics integrates multi-dimensional molecular data to advance cancer research, ultimately facilitating the development of personalized therapeutic strategies (Fig. 2F). Liu
et al. [
10] performed a proteogenomic characterization of 107 SCLC tumors, defining four molecular subtypes (nmf1–nmf4) with distinct therapeutic vulnerabilities: nmf1 (high proliferation/NE differentiation) suggested sensitivity to E/P chemotherapy; nmf2 (high DLL3) to anti-DLL3 agents such as tarlatamab; nmf3 (elevated EMT/RTK signaling) to RTK inhibitors; and nmf4 (non-NE, high MYC/POU2F3) to AURK inhibitors. These subtype-specific responses were validated
in vitro and in PDX/CDX models, underscoring the value of molecular subtyping for precision therapy in SCLC.
In high-grade serous ovarian cancers (HGSOCs), Chowdhury
et al. [
96] analyzed 242 pre-treatment biopsies and identified a 64-protein signature that robustly predicts platinum-refractory disease, correctly identifying 35% of refractory tumors with 98% specificity. The signature was validated in two independent cohorts and may be evaluated through a high-throughput and clinical-grade multiplexed MRM-MS assay, enabling an integrated protein-based score to guide frontline chemotherapy decisions.
In ccRCC, Zhang
et al. [
97] integrated multi-omics profiles from 115 tumors to construct a random forest (RF) model predicting TKI Sunitinib drug response. Incorporating clinical, genomic, transcriptomic, and proteomic features, the model achieved high accuracy (ROC-AUC 0.86 in training; 0.98 in testing). Proteomic and transcriptomic variables contributed most strongly, demonstrating the power of multi-omics integration for treatment-response prediction in ccRCC.
Pino
et al. performed a proteogenomic characterization of AML and defined four subtypes based on mRNA, protein, and phosphosite profiles [
95]. Integrating these subtypes with
ex vivo responses to 46 drugs revealed distinct therapeutic vulnerabilities, such as subtype 1 resistance to several agents but sensitivity to a histone deacetylase inhibitor panobinostat. Subtype 2 exhibited sensitivities common in FLT3-ITD samples, while also demonstrating responsiveness to venetoclax (a BCL2 inhibitor) and an NF-κB inhibitor, highlighting its potential as a therapeutic option, even in cases where FLT3-ITD mutations were present. These findings show that multi-omics subtyping captures molecular features beyond mutation status. The authors also developed a drug-response prediction model to support treatment selection and refine therapeutic strategies for relapsed patients.
3.7 Pan-cancer proteogenomic analysis
Proteogenomic profiling has significantly advanced our understanding of the molecular mechanisms within individual tumor cohorts. Building on these foundations, the next wave of research has shifted to characterizing molecular features across diverse cancers through pan-cancer proteogenomic analyses. For instance, CPTAC and other investigators have collectively made publicly available MS-based proteomic data for over 2000 human tumors, complemented by clinical parameters and multi-omics data, including somatic mutations, CNVs, and mRNA expression. These comprehensive data sets enable systematic pan-cancer analyses, provide a unique opportunity to uncover novel molecular subtypes, define tumor immune landscapes, explore the impacts of genomic aberrations, elucidate extensive post-translational regulatory networks, and identify actionable biomarkers and therapeutic targets at pan-cancer level. This approach holds significant promise for advancing precision oncology and fostering the development of novel cancer treatments.
3.7.1 Pan-cancer molecular subtyping
Pan-cancer classification based on omics data offers a powerful approach to understanding the molecular similarities and differences among various tumor types, independent of their tissue of origin. For example, based on the research of pan-cancer molecular subtyping using transcriptomic data from 10 224 samples across 32 tumor types in TCGA data set [
178], Chen
et al. further used proteomic data to classify 532 samples from six tumor types into 10 molecular subtypes [
179], overall providing complementary insights to transcriptomic classifications by capturing information not discernible at the RNA level. Another proteogenomic-based pan-cancer study classified 2002 samples from 14 tumor types into 11 subtypes [
180], two subtypes among which are specially enriched for brain tumors with different characteristics, and this classification shows substantial concordance with the previous 10 proteomic subtypes [
179]. More recently, the CPTAC consortium used proteogenomic data to classify 1064 samples from 10 tumor types into four molecular subtypes [
181], uncovering shared genetic-driven pathways across tumors and subtype-specific protein or phosphoprotein expression landscape. Overall, pan-cancer classifications based on multi-omics data enhance our understanding of tumors at the molecular level and provide valuable resources for deeper exploration of tumor biology and potential therapeutic strategies.
3.7.2 Pan-cancer tumor immunity
One of the major challenges in tumor immunotherapy is overcoming drug resistance or poor treatment response. Classifying tumors based on their immune profiles to predict drug response and guide personalized therapy is crucial to advancing immunotherapy research. Using 29 TME functional gene expression signatures, Bagaev
et al. [
182] analyzed transcriptomic data from over 10 000 samples across approximately 20 tumor types and identified four immune categories: immune-enriched, fibrotic (IE/F), immune-enriched, non-fibrotic (IE), fibrotic (F), and immune-depleted (D). These classifications were shown to predict immunotherapy response, with the IE subtype being responsive and the F subtype non-responsive. Recently, CPTAC investigators classified 1056 samples from 10 tumor types into seven immune subtypes based on cell type fractions and 427 immune-related signatures [
183]. This study not only provided a detailed description of the genomic, epigenetic, transcriptomic, and proteomic characteristics associated with each subtype but also validated the CD8
+/IFNG
+ subtype as highly responsive to immunotherapy using clinical trial data. These findings offer valuable insights for guiding individualized clinical treatments.
In addition, given the limitations of current immunotherapies, uncovering new molecular mechanisms and identifying novel therapeutic targets is essential. Pan-cancer data provides a rich resource for such discoveries. In the aforementioned CPTAC study [
183], genomic analyses revealed subtype-specific molecular mechanisms, including the role of STK11 in reducing immune infiltration in patients with activated interferon-γ signaling, further potentially promoting relative immunotherapies development. Zhang
et al. [
184] used T cell exhaustion (TEX) signaling pathway characteristics to classify five TEX subtypes across cancers and identified four previously unknown TEX-associated genes (TLL1, P2RY8, MYH11, and PRKD2) by analyzing 568 cancer driver genes, offering potential targets for personalized immunotherapies. Sheng
et al. [
185] evaluated the contributions of intra-tumoral bacteria and fungi to tumor immunity by analyzing 9853 samples from 33 tumor types. They demonstrated that the tumor-resident microbiome holds prognostic value, further expanding our understanding of tumor immunity.
3.7.3 Pan-cancer tumor biology
Tumors often arise from genetic mutations, such as deletions of TSGs or amplifications of oncogenes. Pan-cancer data serves as a valuable resource for integrating proteomics and genomics, providing deeper insights into the impacts of genetic aberrations. For instance, Li
et al. [
181] analyzed genome, transcriptome, proteome, and phosphoproteome data from 1064 samples spanning 10 tumor types to understanding the functional states of oncogenic drivers and their links to cancer development. By correlating driver genes with multi-omics data, they performed systematic analyses of
cis-effects,
trans-effects, and protein–protein interaction networks, uncovering the mechanisms underlying driver mutations in cancer. snRNA-seq analysis revealed that tumors with microsatellite instability (MSI) could be sensitive to nonsense-mediated decay (NMD) inhibition. Further examination of CNA events using kinase libraries identified links between CNAs and cyclin-dependent kinase (CDK) activation, such as
CDKN2A deletions and
RB1 alterations, and they also implied that CDK inhibitors may offer therapeutic benefits for tumor types with genomic alterations such as
MCL1 or
ERBB2 amplification.
In addition to genetic aberrations, the research of abnormal molecular regulatory networks, which are primarily driven by altered expression levels of key protein like kinases and PTMs such as phosphorylation and acetylation, is also vital in cancer biology. Pan-cancer proteogenomic data provide an extensive resource for investigating these molecular regulatory networks. For instance, as previously mentioned, CPTAC investigators clustered pan-cancer samples into four subtypes based on proteogenomic data [
181], detailing differences in protein expression and phosphorylation site levels among subtypes. This was the first study to comprehensively elucidate the similarities and differences in phosphorylation modifications across various cancers. Additionally, Geffen
et al. [
186] utilized proteomic data from 1110 samples across 11 tumor types to uncover shared PTM regulatory patterns in different cancers. These include phosphorylation-driven dysregulation of DNA repair, acetylation-driven metabolic alterations associated with immune responses, and crosstalk between acetylation and phosphorylation that influences kinase specificity and histone regulation. Beyond describing aberrant PTM patterns, research also focuses on identifying PTM executors like kinases. For example, Elmas
et al. [
187] developed a novel algorithm called OPPTI to analyze proteomic and phosphoproteomic data from 10 tumors, identifying 23 overexpressed, druggable kinase targets.
3.7.4 Pan-cancer therapeutic targets
The integration of pan-cancer proteogenomic data offers a robust platform for identifying therapeutic targets and advancing cancer treatments, as proteins are the primary functional effectors and drug targets in oncology. For instance, Sengupta
et al. [
188] leveraged the Precision Oncology Evidence Database (DEPO) to correlate druggability with genomic, transcriptomic, and proteomic biomarkers using a pan-cancer cohort of 6 570 samples, leading to the identification of potentially druggable biomarkers. Similarly, Zhou
et al. [
189] identified 1139 therapeutic-targeted proteins across 16 tumor types in proteomic data, highlighting the breadth of targetable proteins available for exploration. Further advancing this field, Savage
et al. [
190] utilized CPTAC pan-cancer proteogenomic data in combination with gene dependency data from cell lines to systematically identify novel therapeutic candidates. Their work uncovered a range of targets, including overexpressed and hyperactive proteins, proteins linked to TSG loss, potential neoantigens, and tumor-associated antigens. Notably, they identified five
KRAS mutant peptides as potential neoantigens shared across four tumor types. Personalized treatment strategies have also emerged from such analyses. For example,
TP53 loss mutations have been proposed as candidate biomarkers for CHEK1 inhibition in select BC cases and for doxorubicin therapy in endometrial cancer patients. These findings underscore the value of integrating multi-omics data for the systematic exploration of cancer therapeutic targets, paving the way for precision oncology approaches tailored to individual patients’ molecular profiles.
4 Conclusions, limitations, and future directions
Proteogenomics is rapidly emerging as a transformative discipline in precision oncology, offering a comprehensive framework that bridges genomic alterations with protein-level changes and PTMs. This integrated approach has uncovered novel biomarkers, therapeutic targets, and molecular subtypes, significantly advancing our understanding of cancer biology and therapeutic resistance. By filling gaps left by single-omics approaches, proteogenomics has demonstrated its potential to transform cancer research and clinical care.
Despite its promise, proteogenomics faces several limitations and challenges. Firstly, variability in sample collection and preservation (e.g., FrFr vs. FFPE) often impacts data quality and reproducibility. Differences in proteomics methodologies, such as DDA, DIA, and TMT, as well as the use of diverse database search tools, hinder the harmonization of proteogenomic data. While initiatives such as CPTAC have made significant strides in standardizing workflows, a globally consistent framework for data collection and processing remains essential. On the other hand, while individual cohort studies, such as those for LUAD, provide valuable insights, the lack of standardized protocols for sample handling, data processing, and analysis limits the integration of findings across cohorts. SOPs for consistent data collection and computational methods capable of unifying disparate data sets are urgently needed.
Secondly, current proteogenomic studies predominantly use bulk tumor and NAT samples, which provide depth in molecular characterization but fail to capture the spatial and cellular heterogeneity within tumors. Emerging spatially resolved proteogenomics and single-cell genomics and proteomics approaches hold promise for uncovering TME dynamics and evolutionary processes, offering a more nuanced understanding of cancer biology.
Thirdly, most proteogenomic studies focus on treatment-naïve or surgically resected tumors, limiting insights into metastatic diseases or treatment response. Future proteogenomic efforts must integrate analyses of treated tumors and incorporate data from therapeutic trials to bridge this gap.
Furthermore, many proteogenomic studies generate extensive data-driven associations that are hypothesis-generating rather than providing definitive biological conclusions or therapeutic strategies. Moving beyond exploratory findings toward actionable therapeutic strategies demands robust downstream biological and clinical studies.
Lastly, proteogenomics has identified a range of potential therapeutic targets and biomarkers, yet translating these findings into clinical applications remains a critical challenge. For instance, many proteogenomic studies have uncovered novel molecular subtypes, often linked with specific biomarkers and therapeutic strategies. However, advancing these discoveries to clinical implementation requires efforts in several key areas, including developing molecular subtyping tools based on proteogenomic data that integrate tumor classification with specific biomarkers to guide clinical diagnosis and treatment decisions, leveraging the characteristics of molecular subtypes to design targeted therapeutic approaches, and validating these strategies through robust preclinical and clinical studies.
Collectively, as these approaches and technologies continue to advance, the integration of interdisciplinary collaboration, robust data sharing, and active clinical engagement is expected to enable proteogenomics to reach its full potential, transforming cancer research and precision medicine and ultimately enhancing patient outcomes.