Introduction
Phosphorylation and dephosphorylation of key cellular factors and the deregulation of kinase signaling pathways are commonly associated with various cancers. In 2011, 11 years after their first version of a review of the hallmarks of cancer in 2000 (
Hanahan and Weinberg, 2000), Douglas Hanahan and Robert A. Weinberg published an updated review paper in
Cell reviewing the newly emerged hallmarks of cancer (
Hanahan and Weinberg, 2011). In this paper, they described the progress in the past decade for understanding the six hallmarks summarized in their original presentation in 2000, and also addressed new developments in the research of two enabling characteristics and two new emerging hallmark capabilities. Phosphorylation events are central to the development of cancer, involved in most of the ten hallmarks.
Among the many biochemical mechanisms involved in cellular signaling, protein phosphorylation is one of the most common control mechanisms to regulate a variety of biological processes, including cell proliferation, migration, DNA reparation, and apoptosis. Because phosphorylation is reversible and can occur in a short time, the regulation of processes using phosphorylation provides cells with a quick response to extracellular stimulation. Over the past decades, great efforts were made to estimate the phosphorylation level of signaling pathways in different types of cancers. Phosphorylation events were reported in lung cancer (
Zhang et al., 2011b;
Jun et al., 2012), breast cancer (
Kok et al., 2009;
Lin et al., 2010;
Chen and Gallo, 2012), colorectal cancer (
Xu et al., 2010;
Wang and Basson, 2011), pancreatic cancer (
Zhao et al., 2008;
Yu et al., 2012), and prostate cancer (
Furic et al., 2010;
Yu and Luo, 2011).
In the past, phosphorylation of certain molecules was typically studied on an individual basis using biochemical approaches. Traditional biological and genetic methods to study phosphorylation include western blot, immunostaining and irradiation labeling (typically with 32P) to analyze changes in the amount of phosphoproteins or their phosphosites. These methods are labor-intensive to determine phosphorylation information even for a limited amount of proteins and thus analyzing their possible pathways. In the past few years, scientific interest in the phosphorylation status of proteins and peptides has increased as have the number of available methodologies to examine this elusive status. Some of the high-throughput array-based technologies that have been developed include peptide arrays, reverse-phase protein arrays, and antibody arrays, reducing lengthy, tedious work flows. The development of mass spectrometric-based proteomic and phosphoproteomic technologies has opened another door for efficiently monitoring phosphorylation events in a high-throughput manner. Mass spectrometry-based phosphoproteomic research has made it possible to deal with complicated phosphorylation networks within a reasonable amount of time. In the last two decades, phosphoproteomic technologies have made great progress and more sensitive phosphopeptide enrichment methods and instrumentation have been developed. With phosphoproteomic technology, the number of identified phosphoproteins and their phosphorylation sites are growing rapidly, which has provided us with the potential to discover and analyze complicated phosphorylation-based signaling networks. Furthermore, because phosphorylation status is not only evaluated as “phosphorylated” or “dephosphorylated,” quantitative phosphoproteomic researches comparing phosphorylation levels among different types of samples such as non-cancerous versus cancerous, early stage versus late stage, etc., can also provide precious information on how biological states are controlled through phosphorylation events.
In this review, we will discuss the main developments in phosphoproteomic analysis as well as common processes and technologies used in phosphoproteomic research to accurately identify and quantify as many phosphopeptides/phosphoproteins as possible, and to accurately assign their phosphosites. The approaches for analyzing phosphorylation events in cancer research are also reviewed.
Phosphoproteomic approaches for identification of phosphoproteins
Up to date, two different mass spectrometric strategies are most widely used in proteomic and phosphoproteomic studies: top-down and bottom-up analyses. In top-down strategies, intact proteins or phosphoproteins are fractionated with gel-based electrophoresis or enriched by antibodies for direct fragmentation and identification by mass spectrometry (
Whitelegge et al., 2006). Top-down analyses offers higher sequence coverage analysis of proteins, but the sample preparation and chromatography is more challenging than the alternative bottom-up strategy. There are several reviews that discuss top-down proteomic approaches in detail (
Reid and McLuckey, 2002;
Bogdanov and Smith, 2005;
Whitelegge et al., 2006;
Siuti and Kelleher, 2007;
Breuker et al., 2008;
Cui et al., 2011).
On the other hand, for proteomic, and especially phosphoproteomic studies, more groups are employing bottom-up strategies. In bottom-up analyses, researchers typically digest whole cell/tissue lysates or the proteins that they are interested in into peptide mixtures for shotgun searches to identify different proteins. For phosphoproteomic research, some extra steps are normally included for pre-fractionation and enrichment of phosphorylated peptides with different strategies. Generally speaking, protein mixtures extracted from cells or tissues are first treated with reduction and alkylation reagents to destroy the disulfide bonds and three-dimensional protein structures, thus making them easier to be accessed by digestion enzymes. The protein mixtures will then be treated with site-specific enzymes such as trypsin (cleaves after arginine and lysine, if not followed by proline,
Olsen et al., 2004), Lys-C (cleaves after lysine), Asp-N (cleaves before aspartate), cyanogen bromide (cleaves after methionine,
Villa et al., 1989), to cut the protein into smaller peptide pieces. The resulting peptide mixtures will then be fractionated and enriched with different methods such as strong cation exchange chromatography (SCX) or affinity-based chromatography, and then be desalted and lyophilized and finally analyzed via liquid chromatography mass spectrometry (LC-MS) for identification. In this part, we will mainly focus on the bottom-up strategies and will discuss details for the typical steps in bottom-up phosphoproteomic research for phosphopeptide identification.
Challenges of phosphoproteomics
Compared with the fast developing proteomic technologies, phosphoproteomic studies have encountered more challenges. One of the main challenges in the field is that although numerous phosphorylation events are occurring, most of these phosphorylation events are at low levels (
Goshe, 2006). It is predicted that approximately 30% of cellular proteins contain covalently bound phosphate in human proteome (
Cohen, 2000). However, those phosphoproteins tend to have multiple phosphorylation sites but not with high abundance (
Goshe, 2006). Actually, it is suggested that compared to proteins of higher abundance, proteins of lower abundance in human cells tend to be phosphorylated at more sites (
Yachie et al., 2009). The low abundance and low phosphorylation levels make it much more difficult to identify phosphoproteins than non-phosphorylated ones.
During electrospray ionization (ESI) process, molecules compete with each other to be ionized, so that only a limited number of molecules are ionized and enter mass spectrometer for analysis. This phenomenon is called ion suppression and was first reported in 1996 (
Buhrman et al., 1996;
Mallet et al., 2004;
Larger et al., 2005). In other words, mass spectrometers only analyze with a limited number of molecules at a given time. If the sample is too complex, it will just give the information regarding those molecules present in the highest abundance. Because phosphorylated peptides are just a small percentage of the whole peptide mixture, in order to get information on the phosphopeptides, and especially those that are present at low abundance, pre-treatment steps like sample fractionation to reduce sample complexity and phosphopeptide enrichment before LC-MS analysis are typically required (
Grimsrud et al., 2010;
Imamura et al., 2012;
Kanshin et al., 2012).
Another challenge in phosphoproteomic research is that phosphorylation events are regulated by complicated processes and a large number of proteins can be phosphorylated. The phosphosite database now lists 177,916 non-redundant phosphorylation sites on 18,719 proteins (http://www.phosphosite.org/, data updated on 20th, Aug, 2012,
Hornbeck et al., 2012). The fast temporal dynamics of protein phosphorylation regulates the rapid activation and deactivation of cellular signaling pathways, so time points are another very important factor in phosphoproteomic research. Furthermore, proteins might be phosphorylated at different amino acid sites under different conditions. Also, phosphorylation is not always an off-and-on event to modulate cellular processes and molecular functions, so that phosphosite localization and phosphorylation level determination are also important tasks in phosphoproteomic researches. For multisite phosphorylation, in many cases, the phosphorylation on one site will control the phosphorylation on another site (
Holmberg et al., 2002;
Yang, 2005). Current research mostly quantitatively measures the fold changes of certain phosphosites before or after a treatment. However, to make this evaluation more accurate, stoichiometry analysis of phosphorylation is required for evaluating multisite phosphorylations (
Mayya and Han, 2009). In other words, in phosphoproteomic research, the important tasks include identifying phosphopeptides and their phosphosites, as well as quantifying their temporal and stoichiometry changes to different stimulations (
Nita-Lazar et al., 2008).
Although more and more phosphosites are being identified in the human genome, those with clear functional studies are still limited. We expect that the study of global phosphoproteomes will provide a large pool of targets for more detailed functional studies.
Separation and enrichment methods
As described above, phosphopeptides exist in relatively low abundance and they are largely suppressed in the presence of dominant non-phosphorylated peptides during ionization, so that the pre-fractionation steps to reduce sample complexity and also the enrichment steps to concentrate phosphopeptides are necessary steps in phosphoproteomic research. Up to date, several fractionation and enrichment methods have been reported, including chromatography-based fractionation methods such as strong cation exchange (SCX) (
Lim and Kassel, 2006;
Sui et al., 2008), strong anion exchange (SAX) (
Mazsaroff et al., 1987;
Kozak et al., 2003;
Han et al., 2008a), and hydrophilic interaction liquid chromatography (HILIC) (
Alpert, 1990;
Mant et al., 1998;
McNulty and Annan, 2008); affinity-based methods for phosphopeptide enrichment, such immobilized metal affinity chromatography (IMAC) (
Andersson and Porath, 1986;
Lee et al., 2007) and metal oxide affinity chromatography (MOAC) (
Wolschin et al., 2005;
Rohrig et al., 2008); and antibody-based methods, which are mostly used for phospho-tyrosine enrichment. Some recent reviews have summarized some of these pre-fractionation and enrichment methods (
Thingholm et al., 2009;
Grimsrud et al., 2010;
Eyrich et al., 2011;
Rosenqvist et al., 2011;
Imamura et al., 2012;
Kanshin et al., 2012). In this section, we will discuss the most popular and widely used methods for pre-fractionation and enrichment in large-scale phosphoproteomic research. Also, a conceptual drawing of some of the enrichment methods is shown in Fig. 1.
IMAC. IMAC enrichment methods are based on the high affinity of the phosphoryl group to metal ions. The history of IMAC can date back to 1986 when Anderson and Porath first reported the isolation of phosphoproteins by immobilized metal (Fe
3+) affinity chromatography (
Andersson and Porath, 1986;
Anguenot et al., 1999;
Li and Dass, 1999;
Lee et al., 2007). After that, several other groups reported the immobilization of Ga
3+ (
Posewitz and Tempst, 1999;
Seeley et al., 2005;
Aryal et al., 2008), Al
3+ (
Andersson, 1991), and Zr
4+ (
Feng et al., 2007b) to selectively isolate phosphoproteins. In IMAC, for immobilization of the metal ions, metal-chelating agents like iminodiacetic acid (IDA) or nitriloacetic acid (NTA) are typically used (
Andersson and Porath, 1986;
Andersson, 1991;
Posewitz and Tempst, 1999;
Aryal et al., 2008). Zou’s group also developed a method to immobilize Zr
4+, Ti
4+ with phosphate polymers (
Han et al., 2008b;
Zhou et al., 2008;
Yu et al., 2009). With the growth of proteomic and phosphoproteomic research, several groups have applied IMAC strategy to the large-scale phosphoproteomic studies (
Jin et al., 2004;
Moser and White, 2006;
Feng et al., 2007a;
Machida et al., 2007).
However, there are limitations of the IMAC method deriving from competition for the phosphoryl groups with other charged groups. Since the IMAC method is based on the affinity between negatively charged phosphate groups and the positively charged immobilized metal ions, other groups carrying negative charges will compete with the phosphoryls and thus affect the specificity of the enrichment. The most common affects come from the carboxylic acid groups (e.g. those peptides containing lots of glutamic acids and aspartic acids) that are also negatively charged under neutral pH condition. To reduce the competition from carboxylic acidic groups, the enrichment with IMAC typically use low pH for peptide binding so that carboxylic acid groups will have no charge and thus no affinity to IMAC resin. The elution steps are then normally performed by increasing pH or by introducing phosphate ions to the eluant to destroy the binding between phosphoryl groups and metal ions (
Andersson and Porath, 1986). Several groups have tried to optimize the selectivity of phosphopeptide binding by adjusting the binding and elusion conditions. Aryal et al. optimized the binding and elution conditions for commercially available gallium (III)-IMAC column (PhosphoProfile, Sigma) (
Aryal et al., 2008). The authors suggested that the selectivity of a gallium (III)-IMAC column toward phosphopeptides can be increased by loading peptides in 1% trifluoracetic acid, and both singly and multiply phosphorylated peptides could be efficiently recovered by elution with 0.4 M ammonium hydroxide. They also reported similar results by eluting with 50% acetonitrile containing 20 mg/mL dihydroxybenzoic acid and 1% phosphoric acid. Liu and Stupak developed a novel open tubular immobilized metal ion affinity chromatography (OT-IMAC) method and used ammonium phosphate for phosphopeptide elution (
Liu et al., 2004;
Stupak et al., 2005). Stensballe et al. used a mixture of 2,5-dihydroxy benzoic acid (2,5-DHB) and o-phosphoric acid to elute phosphopeptides from IMAC and they confirmed with MALDI analysis that both singly and multiply phosphorylated peptide were recovered efficiently (
Stensballe and Jensen, 2004). Imanishi et al. reported another method of combining acetonitrile with phosphoric acid as the elution buffer and recovered phosphopeptides from IMAC resin with high efficiency (
Imanishi et al., 2007). Thingholm et al. also presented a SIMAC (sequential elution from IMAC) concept, in which they eluted the IMAC bound phosphopeptides from highly complicated biological samples sequentially with different buffers, and further enriched the elutes with TiO
2, thus separating the mono-phosphorylated peptides from the multiple phosphorylated ones, and achieved 3-fold increase in recovery of the multiple phosphorylated peptides (
Thingholm et al., 2008).
Besides optimizing the binding and elution conditions, it was also reported that the conversion of carboxylic acid groups in a peptide mixture before applying it to IMAC will effectively eliminate the nonspecific retention of nonphosphorylated peptides on the IMAC columns. Ficarro’s group used o-methyl esterification to react with the free carboxylic acid group to eliminate the competition with phosphorylated groups (
Ficarro et al., 2002). However, the disadvantage of this method is that modification on carboxylic acid groups might cause database searching problems because of the more complicated searching algorithm.
MOAC. Enriching phosphopeptides with MOAC is a newer technology. The best known and most commonly used metal oxide is TiO
2. The first report of using TiO
2 to separate phosphopeptides and nucleotides was reported by Ikeguchi et al. in 1997 (Ikeguchi and Nakamura, 1997). In their research, they reported that phosphates could efficiently attach to TiO
2 under acidic conditions and could be eluted under basic conditions, which made the enrichment of phosphopeptides with TiO
2 both possible and practical. In the years 2004-2005, several groups incorporated TiO
2 into large scale phosphoproteome analysis (
Kuroda et al., 2004;
Pinkse et al., 2004;
Larsen et al., 2005;
Schlosser et al., 2005). As indicated in Schlosser’s paper, they could achieve a high recovery rate of approximately 90% after optimizing the selectivity of the phosphopeptide enrichment with titansphere.
MOAC uses a similar principal as IMAC for phosphopeptide enrichment, also based on the positive charge of MOAC molecules, based on an affinity to negatively charged phosphoryl groups. Similar to IMAC, acidic peptides rich in glutamic acid and aspartic acid residues also cause the contamination from non-phosphorylated peptides. To reduce the competition from acidic peptides, it was reported that addition of DHB (
Larsen et al., 2005), phthalic acid (
Thingholm et al., 2006;
Bodenmiller et al., 2007), sodium salt of 1-octanesultlnic acid (
Mazanek et al., 2007), glutamic acid (
Wu et al., 2007), or HFBA (
Mazanek et al., 2010) to the loading buffer can largely prevent the interruption from non-phosphorylated peptides because these acids can competitively bind to TiO
2 to overcome the binding of other carboxylic acids (
Eyrich et al., 2011). However, a problem of adding aromatic acids such as DHB and phthalic acid is that because they are very hydrophobic, they might be difficult to remove in the later desalting steps and cause complications in the LC-MS analysis. To solve this problem, Sugiyama et al. developed a novel method of using aliphatic hydroxy acid-modified metal oxide chromatography (MOC) (
Sugiyama et al., 2007). They confirmed the enrichment of phosphopeptides using titania and zirconia with the aid of lactic acid and beta-hydroxypropanoic acid, and successfully reduced the attachment of acidic non-phosphopeptides to metal oxides. Also, similar methods to IMAC to convert peptide carboxylates into their corresponding methyl esters was also reported to sharply reduce nonspecific binding and improve the selectivity for phosphopeptides (
Simon et al., 2008). The typical elution conditions from a metal oxide was discussed by Eyrich et al., indicating that the phosphopeptides can be eluted with ammonium bicarbonate with 50 mM ammonium phosphate (pH 10.5), ammonia solution (pH 10.5-11), or with a pH gradient from pH 8.5 (100 mM triethylammonium bicarbonate) to pH 11.5 (
Eyrich et al., 2011). Typically, the metal oxide can be packed in a column for online analysis (
Kuroda et al., 2004), packed in tips for offline enrichment steps (
Thingholm et al., 2006), or even used directly as a sphere suspension (
Li et al., 2009).
Besides the most popular TiO
2, several other researchers also showed enrichment of phosphopeptides with zirconium dioxide (ZrO
2) (
Cuccurullo et al., 2007;
Zhou et al., 2007;
Nelson et al., 2009;
Mazanek et al., 2010) and aluminum hydroxide (Al(OH)
3) (
Wolschin et al., 2005). Other researchers reported the usage of carefully manipulated metal dioxide microspheres for phosphopeptide enrichment, such as Fe
3O
4@TiO
2 core-shell microspheres (
Li et al., 2008) and Fe
3O
4@SiO
2@CeO
2 microspheres (
Cheng et al., 2011). These microspheres were reported to facilitate rapid enrichment and isolation, and the meso-porous CeO
2 shell in Fe
3O
4@SiO
2@CeO
2 microspheres were reported to improve the selectivity and efficiency for target phosphopeptides enrichment.
SCX, SAX, HILIC. SCX is an alternative for phosphopeptide enrichment and fractionation. It has been successfully used for separating phosphopeptides from peptide mixtures prior to LC-MS analysis and thus is frequently used in multidimensional liquid chromatography for multidimensional protein identification technology (MudPIT) (
Washburn et al., 2001;
Beausoleil et al., 2004). The traditional particles used in a SCX column are based on sulfopropyl- (SP-) groups. The principal of SCX-based phosphopeptide enrichment depends on the charge state of the peptides. The typical tryptic peptide has a net charge of+ 2 at pH 2.7-3.0 due to the positively charged N terminus at one end and the charged Lys/Arg residue at the other end. About 65% of the peptides in a complex tryptic digest will have a+ 2 charge under these conditions. Attachment of a phosphate group lowers the net charge to+ 1, a state that refers to roughly 30% of the phosphopeptides. The rest of the peptides have+ 3 or+ 4 charges which are changed to+ 2 and+ 3 after attachment of the phosphate group, and elute later in the gradient. The+ 1 charged phosphopeptides will elute in the starting mobile phase; typically, they are retained for 2-5 column volumes beyond the void volume and peptides with different charges are eluted with a salt gradient. SCX showed good enrichment for those+ 1 charged peptides in the early fractions.
Gygi’s laboratory utilized the SCX methods for phosphopeptide enrichment for large scale phosphoproteome analysis (
Peng et al., 2003;
Villen and Gygi, 2008;
Zhai et al., 2008;
Dephoure and Gygi, 2011). Other approaches were developed by Heck’s group. They used Lys-N instead of trypsin for proteolytic digestion, and in combination with SCX, they were able to separate peptides with different functional groups such as N-terminal peptides, phosphorylated peptides with a single lysine, peptides with a single basic residue (lysine), and peptides with multiple basic residues (
Taouatas et al., 2011).
However, a disadvantage of the SCX method is that the multi-phosphorylated peptides with no charge or a negative charge are not retained on the SCX column and will be present in the elution. As a result, the flow-through of SCX is enriched with multi-phosphorylated peptides and needs to be further analyzed (
Macek et al., 2009). Often in large-scale phosphoproteomic research, SCX is used in combination with other enrichment methods such as IMAC and MOAC to further enhance the enrichment (
Villen and Gygi, 2008;
Zhai et al., 2008).
Other than SCX, some other chromatography-based methods are widely used for both pre-fractionation and phosphopeptide enrichment in phosphoproteomic research. Strong anion exchange (SAX) is one method. Since phosphopeptides are more acidic than non-phosphopeptides, SAX is an alternative for phosphopeptide enrichment and fractionation (
Han et al., 2008a). Nuhse et al. reported that combination of SAX chromatography with IMAC enrichment yielded more phosphorylated peptides from
Arabidopsis (
Nuhse et al., 2003). They also reported that SAX and SCX pre-fractionation preferentially enriched different subsets of phosphopeptides (
Nuhse et al., 2007).
Hydrophilic interaction liquid chromatography (HILIC) was reported to be used for separating molecules that are weakly or not at all retained on reverse phase columns, especially those with high polarity and hydrophilicity (
Alpert, 1990;
Mant et al., 1998;
McNulty and Annan, 2008). In HILIC, the phosphopeptides tend to have longer retention times compared with non-phosphopeptides and thus could be separated from the non-phosphorylated peptides (
Alpert, 1990). HILIC has been largely used for large-scale phosphoproteome analyses (
McNulty and Annan, 2008;
McNulty and Annan, 2009;
Hao et al., 2011).
Comparison and combination of enrichment methods. Although significant effort has been made to optimize phosphopeptide enrichment conditions, none of the above strategies give a 100% yield of phosphopeptides. Optimization research has been performed comparing the above mentioned phosphopeptide enrichment methods. Cantin et al. optimized loading, washing, and elution conditions for TiO
2 and showed that TiO
2 performed better than Fe(III)-IMAC and ZrO
2 for both simple and moderately complex samples (
Cantin et al., 2007). Negroni et al. estimated the performance of Fe(III)-IMAC and TiO
2-MOAC packed columns with casein standards and mouse liver extracts. The authors indicated that the selectivity of phosphopeptides increased from 12 to 18% to 58-60% by using 0.1 M trifluoroacetic acid as loading buffer instead of 0.1 M acetic acid. However, the procedure induced Fe(III) leaching which decreased the binding capacity and thus reduced the final phosphopeptide identifications with Fe(III)-IMAC column. Also, Fe(III)-IMAC column stability was affected by elution with 0.5 M NH
4OH, which resulted in Fe
2O
3 accumulation in the column. By comparison, TiO
2-MOAC columns showed much better stability with the same conditions. The author indicated that for packed column phosphopeptide enrichment, TiO
2-MOAC was a better choice (
Negroni et al., 2012).
In another paper, the authors compared the selectively of a variety of metal chelate and metal oxide affinity materials in phosphopeptide isolation. They concluded that Fe(III) chelate resin coupled magnetic beads worked as well as TiO
2 coated Dynabeads or TiO
2 spheres, and worked better than the other metal ions such as Ga(III), Fe(III), or Ga(III)- IDA-coated magnetic particles (
Liang et al., 2007).
To increase the performance of phosphopeptide enrichment, some groups combined more than one method to fractionate and enrich for phosphopeptides. One of the most popular methods is to add IMAC or MOAC enrichment steps after SCX fractionation, which can further increase the percentage of phosphopeptides and decrease the complexity of SCX fractions before LC-MS analysis (
Villen and Gygi, 2008;
Zhai et al., 2008;
Huttlin et al., 2010). Performing gel separation followed by MOAC enrichment was also reported to increase confidence for identification of phosphoproteins and phosphosites assignment in complex samples (
Wolschin and Weckwerth, 2005).
Other groups added steps for capturing and enriching multiple phosphorylated peptides from the SCX flow-through to increase the phosphopeptide identification. Zeng and her coworkers developed a Yin-Yang multidimensional LC coupled with MS system to profiling the protein phosphorylation, which enriched phosphopeptides first on a SCX column, and then further enrich the SCX flow-through on a SAX column to enrich those multi-phosphopeptides from the SCX flow-through (
Dai et al., 2007). With this method, they identified 849 phosphopeptides with 809 phosphorylated sites from 1 mg of mouse liver. Hennrich et al (
2011). added a weak anion exchange (WAX) step after SCX which dramatically increased the phosphopeptide identification efficiency and they reported that the identified phosphopeptides increased from 4045 to 11000. Zarei et al (2012). added an electrostatic repulsion hydrophilic interaction chromatography (ERILIC) step before or after SCX and successfully separated singly phosphorylated peptides from the multiply phosphorylated ones. The addition of ERLIC step largely increased multiphosphorylated peptides identification thus increased the whole identified phosphopeptides by 48%.
These research efforts have provided promising directions for the in-depth phosphoproteomic analysis of complicated samples. However, one disadvantage of these complicated combinational enrichment methods is that they largely increase the work load and LC-MS time. The study of phosphopeptide enrichment is still far from perfect. Highly efficient yet simple enrichment methods would vastly improve research efforts.
LC-MS methods for phosphoproteomics
Mass spectrometric technology has developed quickly in the past decade. For high quality phosphoproteomic research, a tandem mass spectrometer is necessary. A tandem mass spectrometer is typically composed of an ion source, multiple mass analyzers, and a detector. Analytical ions are produced by the ion source and then sent to the mass analyzers for further analysis. The most commonly used ion sources are electrospray ionization (ESI), where peptides in solution are charged by a high electric potential, and matrix-assisted laser desorption and ionization (MALDI), in which peptides, embedded in a solid acidic matrix, are charged following irradiation with a laser.
In phosphoproteomic research, the hybrid types of mass spectrometers are more frequently used, such as Quadrupole-Time-Of-Flight (Q-TOF) (
Kristensen et al., 2000), Quadrupole-Ion Trap (
Garrett et al., 2011; Zgola-Grzeskowiak and Grzeskowiak, 2011), Ion Trap-Orbitrap (
Makarov et al., 2006a;
Makarov et al., 2006b;
Yates et al., 2006;
McAlister et al., 2008), and Ion Trap-Fourier Transform Ion Cyclotron Resonance (IT-FTICR) (
Syka et al., 2004b) mass spectrometers. In these instruments, the precursor ions that are isolated in the mass analyzer will be fragmented with different methods to meet the need of the specific experiment. The fragmentation methods mainly used in phosphoproteomic researches are discussed below.
CID/CAD. One of the most commonly used fragmentation methods is collision induced dissociation (CID), also referred as collisionally activated dissociation (CAD) (
Hunt et al., 1986). CID is used in ion-traps and is usually a resonant-excitation type with low energy (<2eV). The collisions with CID are initiated by the acceleration of precursor ions with electrical potential to high kinetic energy and then inducing their collision with neutral gas atoms to transfer the kinetic energy into vibrational energy. The vibrational energy will then cause the breakage of covalent peptide bonds. Because the energy redistribution rate is higher than the dissociation rate, the energy will be randomly distributed over all the bonds so that the weakest bond is the easiest to break (
Frese et al., 2011). In the case of unmodified peptides, each amino acid bond has similar probabilities to fragment so that similar amounts of different b- and y- ions can be generated and used for sequence confirmation. However, in the case of phosphopeptides, because the bonds between the phosphoryl group to serine and threonine are much weaker than the bonds between amino acids, the spectra of phosphopeptides containing phosphoserine and phosphothreonine will show a dominant fragmentation of phosphoryl groups. The precursor ions that lost phosphoryl groups no longer retain the vibration energy. They will also be ejected from the isolation window and are not further fragmented, resulting in poor quality MS/MS spectra. In phosphoproteomic research using CID fragmentation methods, poor identification scores and ambiguous phosphosite assignments are common and are due to inefficient MS/MS fragmentation events. However, because CID can fragment precursors faster and with higher efficiency compared with other methods, it is still widely used in phosphoproteomic research. On the other hand, the neutral loss of phosphate (HPO
3, 80 Da; H
3PO
4, 98 Da) from the precursor ion can actually serve as a reporter group for neutral loss scanning (
Carr et al., 1996). Several groups have reported utilizing the neutral loss scan for phosphopeptide detection using data-based MS/MS methods (
Schlosser et al., 2001;
Schroeder et al., 2004;
Sweet et al., 2006;
Lehmann et al., 2007;
Hsiao and Urlaub, 2010).
HCD. Higher-energy collisional dissociation (HCD) was first introduced with LTQ Orbitrap instruments and named as a higher-energy C-trap dissociation (
Olsen et al., 2007). HCD is similar to the beam type CID (about 100 eV) originally used in triple quadrupole or Q-TOF instruments. HCD uses a similar fragmentation principle as CID. Fragmentation with HCD will also first generate b- and y- ions. However, because the energy is higher in HCD, the b-ions can be further fragmented, so that the MS/MS spectra quality can be improved. However, because HCD does not acquire data as fast as CID, it is reported that for complicated samples, HCD results in less phosphopeptide identifications although the identification scores are increased (
Jedrychowski et al., 2011). The combination of CID and HCD provides a solution. Wu et al. sequentially collected CID and HCD spectra for peptide identification and quantification with an Orbitrap mass spectrometer, and successfully identified 3,557 and quantified 2,079 distinct phosphopeptides from HeLa cell lysates (
Wu et al., 2010). Also, with the development of the Q-Exactive (a benchtop Orbitrap instrument), which features high ion currents and fast HCD fragmentation ability because of parallel filling and detection modes, HCD fragmentation is no longer limited by its low speed. These attributes have made the Q-Exactive an exciting new instrument for phosphoproteomic research (
Michalski et al., 2011).
ETD/ECD. Other fragmentation methods frequently used in phosphoproteomic research include electron capture dissociation (ECD) (Zubarev, 2004) and electron transfer dissociation (ETD) (
Syka et al., 2004a). Cleavage by ETD and ECD often occurs at N-Cα bond which leads to the generation of c- and z- ions (
Frese et al., 2011). It is reported that ETD is particularly useful for highly charged peptides but doubly charged peptides give poor fragmentation results (
Swaney et al., 2007). Several other reports compared ETD/ECD with CID and HCD fragmentation results. Frese et al. reported combining different fragmentation methods including ETD, CID, and HCD with different mass analyzers including linear ion traps and Orbitraps. They concluded that HCD identified more peptides with two charges than CID and ETD. They suggested that by combination of HCD and ETD, the average Mascot score could be increased (
Frese et al., 2011). Good et al. performed a large scale proteomic study using ETD and CID, and found very little overlap between these two methods. They suggested that ETD could perform fragmentation complementary to CID and, that by combining these two methods, the sequence coverage could be increased (
Good et al., 2007).
Choosing the most suitable equipment and fragmentation technique for different types of peptides is very important for increasing protein/peptide identifications and sequence coverage from a complex sample. More detailed information on tandem mass spectrometric strategies for phosphoproteome analysis can be found in Palumbo’s and Borsema’s reviews (
Boersema et al., 2009;
Palumbo et al., 2011)
Phosphoproteomic approaches for quantification of phosphoproteins
In phosphoproteomic research, it is not enough to just identify the phosphorylated proteins involved in a biological process, quantitatively analyzing the changes in phosphorylation levels is also necessary. Based on mass spectrometric technology, several methods were developed for protein quantification, and the most popular methods include label-free approaches and metabolic- or chemical-based isotope labeling methods.
Label-free quantification. Label-free quantification is based on the spectral counts of identified peptides. In LC-MS analysis, peptides with higher abundance will show higher and wider dilution peaks in the LC and thus are more often fragmented to generate MS/MS spectra. Based on this consideration, peptide abundance can be correlated to the number of spectra assigned to it in a semiquantitative way. Petricoin’s group reported the use of a spectral counting method to quantitatively compare pancreatic ductal adenocarcinoma with normal pancreatic duct cells in their recent work (
Zhou et al., 2011;
Zhou et al., 2012a). They identified more than 1,700 proteins and quantitatively evaluated a large amount of these peptides. Xie et al. applied the spectral counting strategy to a comparison of the phosphoproteomic differences in two human tumor cell lines with different metastatic activities, and successfully identified the proteins associated with tumor metastasis (
Xie et al., 2010).
Although the spectral counting method does not require expensive reagents and is relatively easy to execute, inconsistent sample processing and/or differential chromatography across multiple analyses can introduce quantification errors. Including both biological and technical replicates can help reduce relative errors, though these requirements increase instrument time. Also, when analyzing complex samples, especially for phosphoproteomic analyses, a large number of phosphoproteins are quantified with only a single phosphopeptide, so that the accuracy of the matching between the phosphopeptide and a specific phosphoprotein largely affects the accurate quantification of the phosphoprotein. To solve this problem, Chen et al. suggested a better way is to compare spectral counts for peptide groups instead of for protein groups so that there will be no problems caused by the shared peptides (
Chen et al., 2012). More details of the spectra counting method can be found in Zhou’s recent review paper (
Zhou et al., 2012b).
Another frequently used label-free method uses the peak intensities in MS spectra to quantify the abundance of precursor peptides, and uses their corresponding MS/MS spectra for identification as shown in Fig. 2A. For quantification, the extracted ion chromatogram (XIC) will first be generated by plotting the precursor ion intensity as a function of retention time. With a standardization step between different samples, XICs can be used for relative quantitative comparisons between different samples (
Steen et al., 2005). In phosphoproteomic research, the quantification is typically achieved by directly comparing phosphopeptides from two samples analyzed on the same platform, and the levels of non-phosphopeptides derived from the same protein can be used for normalization (
Steen et al., 2005). Another approach involves introducing a synthetic phosphopeptide coded with stable isotopes as spike-in standard into each sample and use the isotopic standard peptide as a reference to get absolute quantification (AQUA) of different samples (
Gerber et al., 2003). Wang et al. also developed a computational method for label-free quantification, by which they normalized the intensities of all the detected precursor ions with a non-isotopic internal reference (
Wang et al., 2006). Because different samples need to be run separately for peak intensity based label-free quantification, this method is highly dependent on the mass resolution and precision, as well as the consistency of the LC system to make sure that the retention time keeps the same among different LC-MS analyses for the same peptides (
Kosako and Nagano, 2011). Also, the need to run samples separately increases the instrument time requirements. More detailed reviews describing label-free relative and absolute quantification are available (
Kito and Ito, 2008;
Neilson et al., 2011).
SILAC/SILAM. Stable isotopic labeling of amino acids in cell culture (SILAC) is an isotopic labeling method based on cellular metabolism. This method was first reported by Bradbury’s group (
Chen et al., 2000) and further developed by Mann’s group (
Ong et al., 2002;
Ong et al., 2003a;
Ong et al., 2003b;
Ong and Mann, 2005;
Ong and Mann, 2006;
Ong and Mann, 2007). In a SILAC experiment, dividing cells acquire isotopically labeled amino acids from the growth media. Since mammalian cells must acquire certain “essential” amino acids from the growth media, if these amino acids are labeled with heavy isotopes, proteins within the cells will also be labeled. The amino acids that can be used for SILAC labeling include leucine, lysine and methionine. Some nonessential amino acids such as arginine and tyrosine were also successfully tested in some cell lines for SILAC labeling.
The typical procedure for performing a SILAC experiment is summarized in Ong’s papers (
Ong and Mann, 2006) and illustrated in Fig. 2B. Populations of cells are grown in media depleted of the target inherent amino acids and supplemented with dialyzed serum. The depleted amino acids are added back to the culture media in either “light” or “heavy” isotope forms before cell culturing. For comparison of cells treated with certain stimuli to those cultured in a basal physiologic condition, cells are grown in “light” media that typically contains L-arginine-
12C
6, L-arginine-
12C
6-
14N
4, L-lysine-
12C
6, or L-lysine-
12C
6-
14N
2, and are stimulated with the desired condition. On the other hand, the control cells are grown in “heavy” media which typically contain L-arginine-
13C
6, L-arginine-
13C
6-
15N
4, L-lysine-
13C
6, or L-lysine-
13C
6-
15N
2, respectively, and kept culturing under physiologic condition (
Pimienta et al., 2009;
Zhong et al., 2012). SILAC was also reported for comparing three conditions, using the “light,” “medium,” and “heavy” isotopic forms of lysine and arginine (
Pan et al., 2009).
The SILAC method is considered to be more accurate than chemical labeling methods which we will discuss below, because it labels proteins during the cell culturing process and doesn’t require post reaction of proteins or peptides. As a result, it doesn’t have the reaction efficiency problem common in chemical labeling methods. The cell lysate can be mixed directly after collecting proteins and prior to all the other sample treatment processes, which decreases errors associated with sample preparation. However, the SILAC labeling is largely dependent on the incorporation of isotopes into cells. Normally, cells need to be cultured for several divisions for them to fully incorporate the isotope labels before analysis, which is almost impossible for primary cells and tissues. Also, not every cell line has commercially available SILAC media, and not every cell line can sustain the dialyzed serum, so that the cell culture system needs to be carefully tested and optimized for sensitive cell lines. Also, the interconversion between arginine and proline will also affect the quantification accuracy.
To quantify harvested samples such as tissue homogenates or body fluids that cannot be labeled metabolically, several groups compared tissue samples with SILAC labeled cell lines as a spike-in standard and calculated the relative amounts of different tissues (
Ishihama et al., 2005;
Monetti et al., 2011). Mann’s group also reported a method they named as “super SILAC” in which they used a mixture of five SILAC-labeled different cell lines as a complicated super spike-in standard. By mixing different types of cell lines, the super standard has more similar protein contents to the real tissue which is composed of different types of cells, thus can identify more proteins from the tissue that are difficult to be identified when comparing to just one SILAC-labeled cell line (Geiger et al., 2012; Deeb et al., 2012).
Yates’s group developed a strategy called “stable isotope labeling in mammals (SILAM) (
McClatchy et al., 2007;
McClatchy et al., 2011). In their research, isotope labeled food was used to feed the model animals. They used an
15N-labeled rat brain as an internal standard for large-scale analysis of the mammalian brain. Several other groups also reported the use of SILAM for labeling entire mice (
Krüger et al., 2008;
Scholten et al., 2011;
Zanivan et al., 2012).
iTRAQ and ICAT. Chemical labeling is an
in vitro approach developed for tagging peptides directly, and can be applied to those samples that are difficult to quantify with
in vivo labeling. The most popular methods are isobaric tags for relative and absolute quantification (iTRAQ) (
Ross et al., 2004), tandem mass tags (TMT) (
Thompson et al., 2003), and isotope coded affinity tags (ICAT) (
Gygi et al., 1999). All of these methods use isotope-containing molecules to tag the peptides. In phosphoproteomic research, the samples with differently tagged peptides are mixed together, fractionated, and the phosphorylated peptides will then be enriched using the methods we reviewed above. In these methods, heavy and light isotope mass tags are bound to reactive amino acids of different protein samples
in vitro.
The iTRAQ method was first reported by Ross et al. (
Ross et al., 2004). In iTRAQ, isobaric mass tags are linked to the peptide N terminus and also the free amines on lysines and arginines. The iTRAQ reagent contains an amine-specific reactive group, a mass balance group, and a reporter group. Both of the reporter and balance groups carry stable isotopes. The combination of reporter groups and balance groups make the total molecular weight of the precursor peptides constant so that the light and heavy labeled peptides will still elute at the same time from the HPLC column and be fragmented by MS simultaneously. The MS fragmentation releases the reporter groups. By comparing the peak areas of the reporter fragments from the iTRAQ label, the peptides are quantified. The other fragments are used to identify the peptides. There are two commercially available versions of the iTRAQ system: a four-plex and an eight-plex version (
Jadaliha et al., 2012;
Pottiez et al., 2012). However, Pichler et al. suggested that the eight-plex iTRAQ labeling gave lower identification rates compared with four-plex iTRAQ labeling (
Pichler et al., 2010). There are several reports that have applied iTRAQ for quantification of phosphopeptides (
Boja et al., 2009;
Yang et al., 2009;
Rudrabhatla et al., 2010;
Wu et al., 2010;
Mertins et al., 2012). More details about iTRAQ for phosphoproteomic analysis can be found in Jones’s and Evans’s reviews (
Jones and Nuhse, 2011;
Evans et al., 2012).
ICAT was developed by Gygi et al. to modify cysteine residues (
Gygi et al., 1999). Each tag contains a sulfhydryl-reactive iodoacetate group, and a biotin group coupled with heavy (
2H originally, but later
13C was used to circumvent issues of peak separation during LC due to the
2H interacting with the stationary phase of the column) (
Yi et al., 2005) or light linkers (
Sethuraman et al., 2004;
Yan et al., 2004). The iCAT reagent reacts directly with cysteine residues in peptide mixtures. The peptides linked with iCAT reagents are then selectively extracted with an avidin column. These peptides can then be analyzed by LC-MS. Quantification is determined by comparing the ratio of the signal intensities of differentially mass-tagged peptide pairs to determine the relative levels of proteins in the two samples. Since iCAT relies on cysteines for the labeling, any peptides, and therefore proteins, without cysteines, will not be detected and quantified. This disadvantage is especially pronounced for detecting post-translational modifications, like phosphorylation, where the modification is localized to individual peptides.
For chemical labeling like iTRAQ and iCAT, the labeling is accomplished
in vitro. Therefore, theoretically all types of cells and tissues can be quantified using these methods. Also, iTRAQ is currently commercially available in an eight-plex format, meaning that eight different samples can be quantitatively compared at in a single LC-MS run. However, the chemical labeling methods like iTRAQ also have disadvantages. Because it uses low mass range reporter ions, the method requires the high resolution mass spectrometer for detection. As a result, when using equipment such as an LTQ-Orbitrap which is commonly used for large-scale proteomics, the reporter ions cannot be detected with the CID fragmentation method. Instead, the HCD method must be used. With traditional mass spectrometers, as mentioned previously, CID is superior to HCD due to the faster data acquisition speed providing more identifications for large-scale proteomic analysis. To get better protein identification, some groups use combined methods of HCD to detect reporter ions and CID to identify proteins (
Dayon et al., 2010). However, with the development of MS instruments, especially with the appearance of the Q-Exactive, that is equipped with higher speed data acquisition HCD, this problem can be resolved.
Data interpretation
Once data are acquired by mass spectrometry, all of the spectral information is stored in an original data file (for example, .raw files are generated by Thermo instruments with the XCalibur acquisition software; .raw directories are generated in Waters instruments with MassLynx acquisition software; .d dictionaries are generated by Agilent instruments with MassHunter software; etc). To translate the spectral information into peptide sequences, data mining and interpretation steps are needed. In this part, the typical data interpretation steps will be discussed.
From spectra to peptide sequences
To get peptide sequence information, searching against a database is normally required to compare the experimental MS/MS spectra with the theoretical spectra that are calculated from peptide sequences stored in a destination database. Before database searching, the information stored in the original data file needs to be extracted and stored as a set of peak lists with a certain file format to be used by the search engine. Several different searching algorithms were developed for peptide identification. Two of the most famous and most commonly used database searching algorithms are SEQUEST (
Eng et al., 1994) and Mascot (
Perkins et al., 1999).
SEQUEST was developed by Yates’s group and first reported in 1994 (
Eng et al., 1994;
Yates et al., 1995). This method initially searches the protein database and generates a list of theoretical peptides from the database with the indicated site-specific enzymes (such as trypsin). The intact mass of each precursor ion will then be compared with the masses of the theoretical peptides. Those theoretical peptides that have masses within a certain tolerance window to an observed precursor ion will be considered as match candidates for this precursor ion. For each candidate peptide, SEQUEST will further generate a theoretical tandem mass spectrum, and the theoretical spectra of all the candidates will be compared with the observed tandem mass spectrum. The similarity between the theoretical spectra and the observed spectrum will be evaluated with a cross-correlation function. The candidate sequence that best matches the theoretical tandem mass spectrum will finally be reported as the best identification for this spectrum.
Several large-scale phosphoproteomic studies have used the SEQUEST searching algorithm for identifying phosphopeptide sequences. Villén et al. reported identification of 2149 phosphoproteins, 8527 phosphopeptides and 5250 non-redundant phosphorylation sites in mouse liver by searching with the SEQUEST algorithm against a mouse IPI database (
Villen et al., 2007). Huttlin et al. used SEQUEST algorithm to mine a data set obtained from 9 mouse tissues. They identified a total number of 12039 proteins, as well as 6296 phosphoproteins, which contains nearly 36000 phosphorylation sites and collected important information for analyzing tissue-specific phosphorylation events (
Huttlin et al., 2010).
Mascot was first reported by Perkins et al. in 1999 (
Perkins et al., 1999). The general searching principal is similar to SEQUEST, however, Mascot assign peptide scores with a probability-based scoring method, instead of the cross-correlation function used in SEQUEST. With Mascot searching, each of the resulting peptides will be evaluated for the probability that the matching is a random event, and the peptide with the lowest probability will be reported as the best match. A significance threshold is also used to set the limitation of highest acceptable probability. For example, if the database contains 10
7 sequences and the significance threshold is set as
P<0.05, the significant matches would be those with probabilities of less than 5×10
-9. The actual reported score is -10log
10(P), where P is the probability, so that the peptide with the highest score will be considered as the best match. The probability-based scoring can judge whether a result is significant or not with a simple rule, and the search parameters can be readily optimized by iteration.
The Mascot searching algorithm is widely used in phosphoproteomic research. Lee et al. used Mascot to search their normal and PP4C depleted data and identified 197 peptides with significant hyperphosphorylation in the PP4C depleted cells. By further evaluating several of the PP4C dephosphorylation substrates, they confirmed KAP-1 as a target whose dephosphorylation is mediated by PP4C and revealed a new function of KAP-1 phosphorylation in DNA checkpoint responses (
Lee et al., 2012). Lo et al. quantitatively compared four data sets to evaluate phosphoproteomic changes at different time points during osteogenic differentiation of human mesenchymal stromal cells (hMSC). With the Mascot searching algorithm, they successfully identified 3223 quantifiable unique phosphopeptides and unraveled potential candidates mediating the osteogenic commitment of hMSCs (
Lo et al., 2012).
Other searching algorithms such as X!Tandem (
Craig and Beavis, 2003;
Craig and Beavis, 2004) and Paragon (which is integrated in ProteinPilot) (
Shilov et al., 2007) were also used in previous phosphoproteomic research. A very detailed review of the above mentioned and other searching algorithms can be found in Kapp and Schutz’s review paper (
Kapp and Schutz, 2007).
Quality control assessment
Although the searching algorithms usually provide scoring systems for preliminary data screening, careful evaluation is required to monitor data quality. One commonly used approach for evaluating the false discovery rate (FDR) for peptide identifications is the targeted-decoy strategy (
Elias and Gygi, 2007). The target-decoy approach allows the estimation of how many false positives are associated with an entire data set. After searching a database containing both the target (forward) and decoy (reversed) sequences, the FDR can be determined with the number of target and decoy identifications. Several studies have used FDRs to evaluate their phosphopeptide identifications (
Olsen et al., 2006;
Beausoleil et al., 2006;
Jiang et al., 2008).
Phosphosite assignment
Compared with proteomic research, a specific challenge associated with phosphoproteomics is correctly determining which amino acid in the identified phosphopeptide is phosphorylated. Although the search algorithms discussed above can provide a putative phosphosite assignment, validation strategies are needed to evaluate the accuracy of these assignments. Several tools have been developed for validating and improving phosphosite assignment, such as Ascore (
Beausoleil et al., 2006), phosphoRS (
Taus et al., 2011), MDScore (
Savitski et al., 2011;
Lemeer et al., 2012), phosphoScore (
Ruttenberg et al., 2008), and software like MaxQuant, which is also used for quantitative experiments (
Cox and Mann, 2008) and ArMone (
Jiang et al., 2010).
Quantification software
For large-scale global phosphoproteomic quantification, manual analysis is time-consuming and unreliable. Phosphoproteomic data can be quantitatively analyzed with either commercial or home-developed software. One of the popular software programs that can support analysis of all the labeling methods is MaxQuant developed by Mann’s group. It is freely available at the MaxQuant home page (
Cox and Mann, 2008;
Cox et al., 2009). MaxQuant used to depend on Mascot searching engine, but the latest version has integrated Andromeda as a peptide searching engine so that it can work without installation of the Mascot searching engine (
Cox et al., 2011). Other recently reported quantification software include MSQuant (
Mortensen et al., 2010), iQuantitator, which is specifically for analyzing iTRAQ-based results (
Schwacke et al., 2009), and ProtQuant, which is specifically for label-free quantification (
Bridges et al., 2007). Commercial software such as Proteome Discoverer, which comes with Thermo Scientific instruments, can also conveniently evaluate quantification results.
Phosphoproteomic approaches in cancer research
With the rapid development of phosphoproteomic technology, including improvements in the experimental methods for identification and quantification of phosphosites, phosphopeptides, and phosphoproteins, with instrumentation and software developments, phosphoproteomic research has become widely used in the field of cancer research. In this part, we will briefly review some of the discoveries made by phosphoproteomic technology in cancer research.
Phosphoproteomic research on DNA damage
It is well known that mistakes introduced to the genome during the processes of repairing DNA damage are causal factors for the development of cancer. DNA damage may be induced by environmental stimuli including chemical insults and irradiation. The processes for repairing DNA damages are regulated through complicated pathways with lots of phosphorylation events. The process of DNA damage repair and their relationship to carcinogenesis are well summarized (
Hoeijmakers, 2009;
Jackson and Bartek, 2009;
Ciccia and Elledge, 2010).
Studying DNA damage and its repairing processes with phosphoproteomic technologies has facilitated research on phosphorylation events for DNA damage repair processes. In a recent paper, Beli et al. reported a detailed phosphorylation-dependent signaling network in response to DNA damage in human osteosarcoma (U2OS) cells. They quantified 11509 phosphorylation sites in cells treated with IR and 11,540 phosphorylation sites in cells treated with etoposide, and suggested complicated functional modules of DNA damage response (DDR) regulated phosphoproteins. They showed that the splicing-regulator phosphatase PPM1G is recruited to sites of DNA damage, while the splicing-associated protein THRAP3 is excluded from these regions. They also observed the cellular hyper-sensitivity to DNA-damaging agents in THRAP3 depleted cells. Their phosphoproteomic research provided valuable information on DNA damage signaling networks (
Beli et al., 2012).
For clinical cancer treatment, methods like radiotherapy, as well as some chemicals, that can induce DNA damage and thus, cell death, are widely used. However, their efficiency is limited because some tumors can repair DNA damage and show radioresistance. In Jorgensen’s review, he suggested that in order apply DDR pathway inhibitors for clinical use to reduce radioresistance, the differences of DDR pathways between tumor and normal cells need to be carefully researched, while also finding those molecules that preferentially target the DNA repair of the tumor but not the surrounding normal tissues (
Jorgensen, 2009). This new strategy also suggested the importance of in depth studying of the DDR pathways. There are several reports studying the phosphorylation events of DDR pathways in cancer cells or tumor tissues (
Powell and Kachnic, 2008;
Bensimon et al., 2010). However, high-throughput comparison of cancerous and normal cells/tissues is still limited. With the development of quantitative phosphoproteomic technology, finding additional targets that have potential clinical value is expected.
Phosphoproteomic research on metastasis
Metastasis is a huge clinical problem, with 90% of cancer morbidity attributed to metastasis (American Association for Cancer Research Progress Report 2011, http://www.aacr.org/home/public–media/science-policy–government-affairs/cancer-progress-report.aspx). Metastatic tumors are highly invasive and can spread from their origins to other sites in the body, so that the commonly used treatment by surgery combined with chemotherapy or radiotherapy is not sufficient, making the clinical treatment more challenging.
A recent research by Mann’s group compared two healthy human mammary epithelial cell lines, two premalignant cell lines, and seven breast cancer cell lines obtained from tumors of defined breast cancer stages, using quantitative proteomic approaches (
Geiger et al., 2012). Their work quantified 7800 proteins from these breast cancer cell lines and obtained a stage-specific signature for breast cancer progression. In another report by Semaan et al., they quantitatively evaluated the phosphoproteome in a benign breast tissue, a primary breast cancer tissue, and a metastatic breast cancer tissue from a single patient who suffered from triple negative breast cancer, using phosphoproteomic technology (
Semaan et al., 2011). They discovered five highly phosphorylated proteins in the metastatic sample, and six highly phosphorylated proteins in the cancer sample.
The phosphoproteome of metastatic lung cancer has also been examined. Wang et al. examined metastasis-associated phosphoproteomic alterations in a series of non-small lung cancer cell lines with varying degrees of in vivo invasiveness. They successfully quantified 854 phosphoproteins with 1796 unique phosphopeptides, and found that nearly 40% of the phosphopeptides showed more than twofold abundance changes in the highly invasive cells. They suggested that in the complex process of cancer progression, the phosphorylation level and phosphorylation location might play important roles (
Wang et al., 2010). The work summarized above suggests the feasibility to identify stage-specific phosphorylation-related biomarkers in cancer cells or tumors, and to understand the mechanisms of metastasis, thus revealed the possibility for developing new therapeutic methods for treatment of metastatic cancers.
Phosphoproteomic research for the discovery of biomarkers and targets for cancer diagnosis and new anti-cancer therapies
The research of cancer biomarkers and targets for anti-cancer therapies is an important field. Numerous reports and reviews describing the discovery of new biomarkers and targets using proteomic technologies have recently been published (
Kong et al., 2006;
Lu et al., 2007;
Christensen et al., 2008;
Xiao et al., 2008;
Hung and Yu, 2010;
Indovina et al., 2012). Wang et al. characterized the phosphoproteome in androgen-repressed human prostate cancer (ARCaP) cells (
Wang et al., 2011). They investigated the phosphoproteome profiles in ARCaP cells to reveal the molecular mechanisms underlying their aberrant responses to androgen and identified 385 phosphoproteins. The phosphoproteins include those that participate in the mammalian target of rapamycin (mTOR) pathway and the E2F signaling pathway, as well as phosphorylation of androgen-induced proliferation inhibitor (APRIN), which provided the potential phosphorylated biomarkers and targets for ARCaP cells.
In Oyama’s report, they quantitatively compared the different responses of wild type and tamoxifen-resistant (TamR) MCF-7 human breast cancer cells when stimulated with 17β-estradiol and heregulin, using quantitative phosphoproteomic and transcriptomic methods (
Oyama et al., 2011). They successfully identified 286 proteins which change their phosphorylation levels and 1603 genes which change their expression levels in respond to ligand stimulations. Their research provided a better understanding of the signaling pathways and transcriptional processes in TamR tumor cells, and thus provided possible targets for the treatment of TamR cancers. More detailed information about strategies and developments of exploring biomarkers and drug targets for cancer therapy using phosphoproteomic technologies can be found in several review papers (
Lim, 2005;
Yu et al., 2007;
Metodiev and Alldridge, 2008).
Phosphoproteomic research to explore kinase/phosphatase network and signaling pathways in cancer models
In cancer research, the studying of signaling pathways and revealing kinase/phosphatase networks is an important field. With the development of phosphoproteomic research, it has become possible to monitor the changes in phosphorylation status for multiple members within a pathway under certain drug treatment or perturbation. Also, with the advantage of high-throughput mass spectrometric studies, many novel phosphorylation events have been identified to enrich the knowledge of already existing networks and signaling pathways in cancer models.
TGF-beta signaling pathway was reported to have seemingly controversial effects for acting in both tumor activator and tumor suppressor function. Ali et al. reported a research using SW480 colon cancer cell line to investigate the TGF-beta signaling pathway by analysis with quantitative phosphoproteomic methods. By profiling SW480 that stably express Smad4, a key regulator in TGF-beta pathway, with a SILAC quantification approach, they discovered the upregulation of 17 phosphoproteins and downregulation of 8 phosphoproteins caused by TGF-beta stimulation. Among them were included several phosphorylation events that had never been reported before (
Ali and Molloy, 2011).
The phosphorylation of tyrosine occurs relatively rarely compared to serine and threonine. However, many biological studies have shown the importance of tyrosine phosphorylation in the regulation of different pathways in cancer. One famous example is EGFR, which can be phosphorylated at different tyrosine sites thus inducing the activation of downstream pathways. EGFR phosphorylation is considered to play important roles in regulating cell survival and proliferation, thus the abnormality of EGFR phosphorylation and the disturbance of its downstream signaling pathways are closely related to the origin and progression of cancer. Because it is known that in many lung tumors EGFR and KRAS are mutated, Guha et al. used SILAC methods and quantitatively compared the phosphoproteomes of normal and EGFR or KRAS mutated human bronchial epithelial cells. They compared them with isogenic human lung adenocarcinoma cell lines which originally harbor the mutants of EGFR or KRAS, and gained detailed information of the effects of EGFR mutation on downstream signaling pathways (
Guha et al., 2008). In Tasaki et al.’s paper (2010), they mutated tyrosine 992 (Y992) of EGFR and examined its effects on the EGF signaling network. Their work provided a network model for the EGFR mutation derived disorders in EGFR downstream signaling. Zhang et al (2010a). carried out label-free quantitative phosphoproteomic studies to evaluate changes of EGFR phosphorylation events caused by somatic activating mutants of EGFR and EGFR inhibitor erlotinib. They confirmed that 3 phosphorylation sites on EGFR in 31 lung cancer cell lines were related to somatic activating mutants and 3 phosphorylation sites on EGFR were related to EGFR inhibitor erlotinib. Ruan et al. studied the phosphoproteomes in nasopharyngeal carcinoma (NPC) and discovered 28 novel EGFR-regulated proteins and generated a signaling network regulated through EGFR phosphorylation in NPC models (
Ruan et al., 2011). More detailed discussions related to mass spectrometry based EGFR signaling pathway and tyrosine phosphorylation studies can be found in Morandell’s, Huang’s and Biarc et al.’s review papers (
Morandell et al., 2008;
Biarc et al., 2011;
Huang, 2012).
Conclusions and future directions
Phosphoproteomics is a fast developing field which could provide large-scale and high-throughput phosphorylation information. Although phosphoproteins mostly exist in low abundance and are hard to detect, the development of efficient and sensitive phosphopeptide enrichment methods, and also the progress of high performance LC-MS instruments, have ensured the feasibility of detecting phosphopeptides with mass spectrometry. By introducing label-free and isotope labeling methods for quantitative phosphoproteomic analyses, phosphoproteomic research was further consummated with the ability to assess the phosphorylation level changes under different conditions and to evaluate the relative or absolute phosphorylation abundance in certain samples. With the improvement of phosphoproteomic technologies, more phosphorylation data sets can be acquired in tumors and cancer cell lines. Quantitative identification of these phosphorylation events will generate a blueprint for understanding the mechanisms of cancer origin, development, key regulators in different types of tumors, and provide possible biological markers and targets for clinical diagnosis and treatment.
Mass spectrometry-based phosphoproteomics is still a developing and expanding technology. Unspecific enrichment from competition of other non-phosphorylated peptides, insufficient enrichment of multiphosphorylated peptides, and errors that are generated during quantification procedures, are challenges that still limit the application of phosphoproteomics in cancer research. The development of faster and more sensitive mass spectrometers, the invention of more specific and sufficient phosphopeptide enrichment methods, the emergence of more accurate and convenient labeling and quantification methods, as well as the improvement for the data interpretation algorithms, will improve the accessibility and utility of phosphoproteomics, and is highly expected for the future cancer research.
Higher Education Press and Springer-Verlag Berlin Heidelberg