Introduction
The field of proteomics has seen a huge expansion in the last two decades. Multiple factors have contributed to the rapid expansion of this field including the ever evolving mass spectrometry instrumentation, new sample preparation methods, genomic sequencing of numerous model organisms allowing database searching of proteomes, improved quantitation capabilities, and availability of bioinformatic tools. The ability to investigate the proteomes of numerous biologic samples, and the ability to generate future hypothesis driven experiments makes proteomics and biomarker studies exceedingly popular in biologic studies today. In addition, the advances in post-translational modification (PTM) analysis and quantification ability further enhance the utility of mass spectrometry (MS)-based proteomics. A subset of proteomics research is devoted to profiling and quantifying neurologically related proteins and endogenous peptides, which has progressed rapidly in the past decade. This review provides a general overview, as outlined in Fig. 1, of proteomics technology including methodological and conceptual improvements with a focus on recent studies and neurological biomarker studies.
Biologic material selection
The choice of biologic matrix is an important first step in any proteomics analysis. The ease of sample collection (e.g., urine, plasma, or saliva) versus usefulness or localization of sample (e.g., specific tissue or proximity fluid) needs to be evaluated early on in a study design.
Plasma, derived by centrifugation of blood to remove whole cells, is a very popular choice in proteomics due to the high protein content (~65 mg /mL (
Liu et al., 2006b)) and the ubiquitous nature of blood in the body and the ability to obtain large sample amounts or various time points without the need to sacrifice the animal or to perform invasive techniques. Plasma is centrifuged immediately after sample collection unlike serum where coagulation needs to occur first. To obtain plasma, blood is collected in a tube with an anticoagulant added (ETDA, heparin, or citrate) and centrifuged, but previous reports have shown variable results when heparin has been used as an anticoagulant (
Holten-Andersen et al., 1999). Human Proteome Organization (HUPO) specifically recommends the anticoagulants EDTA or citrate to treat plasma (
Rai et al., 2005;
Tammen et al., 2005). One of the primary concerns with plasma is degradation of the protein content via endogenous proteases found in the sample (
Lippi et al., 2006). One way to address this problem is the use of protease inhibitors. In addition, freeze/thaw cycles need to be minimized to prevent protein degradation and variability (
Holten-Andersen et al., 2003;
Ytting et al., 2007). Plasma proteomics has seen extensive coordinated efforts to start assessing the diagnostic needs using plasma (
Hanash, 2004). HUPO also has established a public human database for plasma and serum proteomics from 35 collaborating laboratories (
Omenn et al., 2005). Large dynamic range studies have been performed on plasma with a starting sample amount of 2625 µl (157.5 mg), resulting in 3654 proteins identified with a sub 5% false discovery rate (
Liu et al., 2006a).
The large dynamic range spanning across 11 orders of magnitude as visualized in Fig. 2 is one of the biggest obstacles in plasma proteomics. Figure 2 also shows that as lower abundance proteins are investigated, the origins of those identified proteins are more diverse than the most abundant proteins. Recent mining of the plasma proteome showed an ability to search for disease biomarker applications across seven orders of magnitude. In addition, the tissues of origin for the identified plasma proteins were identified and its origin was more diverse as the protein concentration decreased (
Zhang et al., 2011c). Plasma has been used as a source for biomarker studies such as colorectal cancer (
Matsubara et al., 2011;
Murakoshi et al., 2011), cardiovascular disease (
Addona et al., 2011), and abdominal aortic aneurysm (
Acosta-Martin et al., 2011). Even though the blood brain barrier prevents direct blood to brain interaction, neurological disorders, such as Alzheimer’s disease (AD), have had their proteomes studied using plasma (
Ray et al., 2007).
An alternative sample derived from blood is serum which is plasma allowed to coagulate instead of adding anti-coagulates. The time for coagulation is usually 30 min and during that time significant and random degradation from endogenous proteases can occur. The additional variability caused from the coagulation process can change the concentration of multiple, potentially valuable, biomarkers. As biodiversity between samples or organisms is a challenging endeavor, additional sample variability due to serum generation may be undesirable, but serum is still currently being used for biomarker disease studies (
Lopez et al., 2011). Serum has been used to compare the proteome differences in neurological diseases such as AD, Parkinson’s disease, and amyotrophic lateral sclerosis and a review can be found elsewhere discussing the subject (
Sheta et al., 2006).
Cerebrospinal fluid (CSF) has a long history as a surrogate biopsy of brain or spinal cord in evaluating diseases of the central nervous system and has been used for studies in neurological disorders due to being a rich source of neuro-related proteins and peptides (
Zhang et al., 2005). The protein composition of the most abundant proteins in CSF is well defined, and numerous studies exist to broaden the proteins identified (
Maccarrone et al., 2004;
Wenner et al., 2004;
Yuan and Desiderio, 2005c). CSF has an exceedingly low protein content (~0.4 μg/μL) which is ~100 times lower than serum or plasma, and over 60% of the total protein content in CSF consists of a single protein, albumin (
Wong et al., 2000;
Yuan and Desiderio, 2005b;
Roche et al., 2008;). In addition, the variable concentrations of proteins span up to 12 orders of magnitude further complicating analysis and masking biologically relevant proteins to any given study (
Rozek et al., 2007). One of the highest number of identified proteins is from Schutzer et. al with 2630 non-redundant proteins from 14 mL of pooled human CSF. This study involved the removal of highly abundant proteins by performing IgY-14 immunodepletion followed by two dimensional (2D) liquid chromatography (LC) separation (
Schutzer et al., 2010). Studies have also been performed to characterize individual biomarkers or complex patterns of biomarkers in various diseases in the CSF (
Zougman et al., 2008;
Sjödin et al., 2010). One potential pitfall of CSF proteomic analysis is contamination from blood, which can be identified by counting red blood cells present or examining surrogate markers from blood contamination other than hemoglobin such as peroxiredoxin, catalase, and carbonic anhydrase (
You et al., 2005). A proof of principle CSF peptidomics study identified numerous endogenous peptides associated with the central nervous system, which can be used as a bank for neurological disorder studies (
Yuan and Desiderio, 2005a). Numerous recent reports highlighted the utility of CSF analysis for biomarker studies in AD (
Jahn et al., 2011;
Ringman et al., 2012), medulloblastoma, (
Rajagopal et al., 2011) both post-mortem and ante-mortem (
Giron et al., 2011).
Cellular lysates offer the distinct advantage to work with a cell line, yeast, or bacteria with large amounts of proteins available for analysis (
Michalski et al., 2011;
Ting et al., 2011), with
Saccharomyces cerevisiae being the most common cell lysate (
Spirin et al., 2011;
Kellie et al., 2012;). Other cell lines are also used including HeLa (
Wilhelm et al., 2012) and
E. coli (
Zhou et al., 2011). The ability to obtain milligrams of proteins easily to scale up experiments without animal sacrifice offers a clear advantage in biologic sample selection. Current literature supports cellular lysate as a valued and sought after source of proteins for large scale proteomics experiments because of the ability to assess treatments, conditions, and testable hypothesis (
Kellie et al., 2012;
Shteynberg et al., 2011;
Winter and Steen, 2011). Cellular lysate from rat B104 neuroblastoma cell line was used as an
in vitro model for cerebral ischemia and showed abundance changes in multiple proteins involved in various neurological disorders (
Datta et al. 2010).
Other sources of biologic samples
Urine
The urine proteome appears to be another attractive reservoir for biomarker discovery due to the relatively low complexity compared with the plasma proteome and the noninvasive collection of urine. Urine is often considered as an ideal source to identify biomarkers for renal diseases due to the fact that in healthy adults approximately 70% of the urine proteome originate from the kidney and the urinary tract (
Decramer et al., 2008), thus, the use of urine to identify neurological disorders is neglected. However, strong evidence have shown that proteins that are associated with neurodegenerative diseases can be excreted in the urine (
Kuwabara et al., 2009;
De La Monte and Wands, 2001;
Van Dorsselaer et al., 2011), indicating the application of urine proteomics could be a useful approach to the discovery of biomarkers and development of diagnostic assays for neurodegenerative diseases. However, the current view of urine proteome is still limited by factors such as sample preparation techniques and sensitivity of the mass spectrometers. There has been a tremendous drive to increase the coverage of urine proteome. In a recent study, Court
et al. compared and evaluated several different sample preparation methods with the objective of developing a standardized, robust, and scalable protocol that could be used in biomarkers development by shotgun proteomics (
Court et al., 2011). In another study, Marimuthu
et al. reported the largest catalog of proteins in urine identified in a single study to date. The proteomic analysis of urine samples pooled from healthy individuals was conducted by using high-resolution Fourier transform mass spectrometry. A total of 1823 proteins were identified, of which 671 proteins have not been previously reported in urine (
Marimuthu et al., 2011).
Saliva
For diagnosis purposes, saliva collection has the advantage of being an easy and non-invasive technique. The recent studies on saliva proteins that are critically involved in AD and Parkinson’s diseases suggested that saliva could be a potentially important sample source to identify biomarkers for neurodegenerative diseases. Bermejo-Pareja
et al. reported the level of salivary A
β42 in patients with mild AD was noticeably increased compared to a group of controls (
Bermejo-Pareja et al., 2010). In another study, Devic
et al. identified two of the most important Parkinson's disease related proteins— α-synuclein (α-Syn) and DJ-1 in human saliva (
Devic et al., 2011). They observed that salivary α-Syn levels tended to decrease while DJ-1 levels tended to increase in Parkinson's disease. The published results from this study also suggest that α-Syn might correlate with the severity of motor symptoms in Parkinson's disease. Due in part to recent advancements in MS-based proteomics has provided promising results in utilizing saliva to explore biomarkers for both local and systemic diseases (
Al-Tarawneh et al., 2011;
Castagnola et al., 2011), the further profiling of saliva proteome will provide valuable biomarker discovery source for neurodegenerative diseases.
Tissue
Compared to body fluids such as plasma, serum and urine where the proteomic analysis is complicated by the wide dynamic range of protein concentration, the analysis of tissue homogenates using the well-established and conventional proteomic analysis techniques has the advantage of reduced dynamic range. However, the homogenization and extraction process may suffer from the caveat that spatial information is lost, which would be inadequate for the detection of biomarkers whose localization and distribution play important roles in disease development and progression. Matrix-assisted laser desorption/ionization (MALDI) imaging mass spectrometry (IMS) is a method that allows the investigation of a wide range of molecules including proteins, peptides, lipids, drugs and metabolites, directly in thin slices of tissue (
Caprioli et al., 1997;
Chaurand et al., 1999;
Stoeckli et al., 2001;
Chaurand et al., 2002). Because this technology allows for identification and simultaneous localization of biomolecules of interests in tissue sections, linking the spatial expression of molecules to histopathology, MALDI-IMS has been utilized as a powerful tool for the discovery of new cancer biomarker candidates as well as other clinical applications (
Cazares et al., 2011;
Seeley and Caprioli, 2011). The utilization of MALDI-IMS for human or animal brain tissue to identify or map the distribution of molecules related to neurodegenerative diseases were also recently reported (
Stauber et al., 2008;
Yuki et al., 2011).
Secretome
There has been an increasing interest in the study of proteins secreted by various cells (the secretomes) from tissue-proximal fluids or conditioned media as a potential source of biomarkers. Cell secretomes mainly comprise proteins that are secreted or are shed from the cell surface, and these proteins can play important role in both physiologic processes (e.g. cell signaling, communication, and migration) and pathological processes including tumor angiogenesis, differentiation, invasion, and metastasis. In particular, the study of cancer cell secretomes by MS based proteomics has offered new opportunities for cancer biomarker discovery as tumor proteins may be secreted or shed into the bloodstream and could be used as noninvasive biomarkers. The latest advances and challenges of sample preparation, sample concentration, and separation techniques used specifically for secretome analysis, and its clinical applications in the discovery of disease specific biomarkers have been comprehensively reviewed (
Makridakis and Vlahou, 2010;
Dowling and Clynes, 2011). Here, we only highlight the proteomic profiling of neural cells secretome that has been applied to neurosciences for a better understanding of the roles secreted proteins play in response to brain injury and neurological diseases. The LC-MS shotgun identification of proteins released by astrocytes has been recently reported (
Dowell et al., 2009;
Keene et al., 2009;
Moore et al., 2009). In these studies the changes observed in the astrocyte secretomes induced by inflammatory cytokines or cholinergic stimulation were investigated. Alternatively, our group performed 2D-LC separation and included cytoplasmic protein extract from astrocytes as a control to identify cytoplasmic protein contaminants which are not actively secreted from cells (
Dowell et al., 2009).
Sample preparation
Proteomic analysis and biomarker discovery research in biologic samples such as body fluids, tissues, and cells are often hampered by the vast complexity and large dynamic range of the proteins. Because disease identifying biomarkers are more likely to be low-abundance proteins, it is imperative to remove the high-abundance proteins or apply enrichment techniques to allow detection and better coverage of the low-abundance proteins for MS analysis. Several strategies including depletion and protein equalizer approach have been used during sample preparation to reduce sample complexity (
Ahmed, 2009b;
Righetti et al., 2011), and the latest advances of these methods have been reviewed by Selvaraju and Rassi (
2012). Alternatively, the complexity of biologic samples can be reduced by capturing a specific subproteome that may have the biologic information of interest. The latter strategy is especially useful in the biomarker discovery where the changes in the proteome are not solely reflected through the concentration level of specific proteins but also through changes in the post-translational modifications (PTMs). Here, we will mainly discuss the enrichment of phosphoprotein/peptides, glycoprotein/peptides and sample preparation for peptidomics and membrane proteins.
Phosphoproteomics
Phosphorylation can act as a molecular switch on a protein by turning it on or off within the cell. It is thought that up to 30% of the proteins can be phosphorylated (
Kalume et al., 2003) and it plays significant roles in such biologic processes as the cell cycle and signal transduction (
Cohen, 2000). Currently, tens of thousands of phosphorylation sites can be proposed using analytical methods available today (
Nagaraj et al., 2010;
Olsen et al., 2010). The amino acids that are targeted for phosphorylation studies are serine, threonine, and tyrosine with the abundance of detection decreasing typically in that order. Other amino acids have been reported to be phosphorylated, but traditional phosphoproteomics experiments ignore these rare events (
Wagner and Vu, 2000).
In a typical large-scale phosphoproteomics experiment the sample size is usually in milligram amounts to account for the low stoichiometry of phosphorylated proteins. The large amount of protein is then digested, typically with trypsin, but, alternatively, experiments have been performed with Lys-C digestion to produce large enzymatic peptides. The larger peptides produced from Lys-C render higher charged peptides during electrospray ionization (ESI) and allow improved electron-based fragmentation to determine specific sites of phosphorylation (
Chi et al., 2007). From the pool of peptides, phosphopeptides must be enriched otherwise they will be masked by the vast number and higher ionization efficiency of non-phosphorylated peptides. The two most common enrichment techniques are immobilized metal ion affinity chromatography (IMAC) and metal oxide affinity chromatography (MOAC), TiO
2 being the most common oxide used for this purpose. A recent study reported that phosphorylation of neuronal intermediate filament proteins in neurofibrillary tangles are involved in Alzheimer’s disease (
Rudrabhatla et al., 2011).
Glycoproteomics
Protein glycosylation is one of the most common and complicated forlms of PTM. Types of protein glycosylation in eukaryotes are categorized as either N-linked, where gycans are attached to asparagine residues in a consensus sequence N-X-S/T (X can be any amino acid except proline) via an N-acetylglucosamine (N-GlcNAc) residue, or the O-glycosylation, where the glycans are attached to serine or threonine. Glycosylation plays a fundamental role in numerous biologic processes, and aberrant alterations in protein glycosylation are associated with neurodegenerative disease states, such as Creutzfeld-Jakob Disease (CJD) and AD (
Sáez-Valero et al., 2003;
Silveyra et al., 2006). Due to the low abundance of glycosylated forms of proteins compared to non-glycosylated proteins, it is essential to enrich glycoproteins or glycopeptides in complex biologic samples prior to MS analysis. Two of the most common enrichment methods used in glycoproteomics are lectin affinity chromatography (LAC) and hydrazide chemistry. The detailed methodologies of LAC, hydrazide chemistry and other enrichment methods in glycoproteomics have been extensively reviewed in the past (
Tian and Zhang, 2010;
Wei and Li, 2009). In particular, LAC is of great interest in studies of glycosylation alterations as markers of AD and other neurodegenerative diseases due to its recent applications in brain glycoproteomics (
Butterfield and Owen, 2011). Our group has utilized multi-lectin affinity chromatography containing concanavalin A (ConA) and wheat germ agglutinin (WGA) to enrich N-linked glycoproteins in control and prion-infected mouse plasma (
Wei et al., 2010b). This method enabled us to identify a low-abundance glycoprotein serum amyloid P-component (SAP). PNGase F digestion and Western blotting validation confirmed that the glycosylated form of SAP was significantly elevated in mice with early prion infection, and it could be potentially used as a diagnostic biomarker for prion diseases.
Membrane proteins
Membrane proteins play an indispensable role in maintaining cellular integrity of their structure and perform many important functions, including signaling transduction, intercellular communication, vesicle trafficking, ion transport, and protein translocation/integration (
Rucevic et al., 2011). However, due to being relatively insoluble in water and low abundance, it is challenging to analyze membrane proteins by traditional MS-based proteomics approaches. Numerous efforts have been made to improve the solubility and enrichment of membrane proteins during sample preparation. Several comprehensive studies recently covered the commonly used technologies in membrane proteomics and different strategies that circumvent technical issues specific to the membrane (
Gilmore and Washburn, 2010;
Groen and Lilley, 2010;
Helbig et al., 2010;
Weekes et al., 2010;
Griffin and Schnitzer, 2011). Recently, Sun et al. reported using 1-butyl-3-methyl imidazolium tetrafluoroborate (BMIM BF4), an ionic liquid (IL), as a sample preparation buffer for the analysis of integral membrane proteins (IMPs) by microcolumn reversed phase liquid chromatography (μRPLC)-electrospray ionization tandem mass spectrometry (ESI-MS/MS). The authors compared BMIM BF4 to the other commonly used solvents, such as sodium dodecyl sulfate, methanol, Rapigest, and urea, but they found that the number of identified IMPs from rat brain extracted by ILs was significantly increased. The improved identifications could be due to the fact that BMIM BF4 has higher thermal stability and thus offered higher solubilizing ability for IMPs which provided better compatibility for tryptic digestion than traditionally used solvent systems (
Spirin et al., 2011). In addition to characterization of membrane proteome, the investigation of PTMs on membrane proteins is equally important for characterization of disease markers and drug treatment targets. Phosphorylations and glycosylations are the two most important PTMs for membrane proteins. In many membrane protein receptors, the cytoplasmic domains can be phosphorylated reversibly and function as signal transducers, whereas the receptor activities of the extracellular domains are mediated via
N-linked glycosylation. Wiśniewski
et al. provides an informative summary on recent advances in proteomic technology for the identification and characterization of these modifications (
Wiśniewski, 2011). Our group has pioneered the development of detergent assisted lectin affinity chromatography (DALAC) for the enrichment of hydrophobic glycoproteins using mouse brain extract (
Wei et al., 2010a). We compared the binding efficiency of lectin affinity chromatography in the presence of four commonly used detergents and determined that under certain concentrations, detergents can minimize the nonspecific bindings and facilitate the elution of hydrophobic glycoproteins. In summary, NP-40 was suggested as the most suitable detergent for DALAC due to the higher membrane protein recovery, glycoprotein recovery and membranous glycoprotein identifications compared to other detergents tested. In a different study on mouse brain membrane proteome, Zhang et al. reported an optimized protocol using electrostatic repulsion hydrophilic interaction chromatography (ERLIC) for the simultaneous enrichment of glyco- and phosphopeptides from mouse brain membrane protein preparation (
Zhang et al., 2011a). Using this protocol, they successfully identified 544 unique glycoproteins and 922 glycosylation sites, which were significantly higher than those using the hydrazide chemistry method. Additionally, a total of 383 phosphoproteins and 915 phosphorylation sites were identified, suggesting that the ERLIC separation has the potential for simultaneous analysis of both glyco- and phosphoproteomes.
Peptidomics
Peptidomics can be loosely defined as the study of the low molecular weight fraction of proteins encompassing biologically active endogenous peptides, protein fragments from endogenous protein degradation products, or other small proteins such as cytokines and signaling peptides. Studies can involve endogenous peptides (
Hui et al., 2011), peptidomic profiling (
Jahn et al., 2011), and
de novo sequencing of peptides (
Chen et al., 2009;
Ma et al., 2009). Neuropeptidomics focuses on biologically active short segments of peptides and have been investigated in numerous species including
Rattus (
Dowell et al., 2006;
Wei et al., 2006),
Mus musculus (
Che and Fricker, 2005;
Fricker, 2010),
Bovine taurus (
Colgrave et al., 2011), Japanese quail diencephalon (
Scholz et al., 2010), and invertebrates (
Fu and Li, 2005;
Hummon et al., 2006;
Chen et al., 2010;
Vilim et al., 2010). The isolation of peptides is typically performed through molecular weight cut-offs from either biofluids such as CSF, plasma or tissue extracts. If the protein and peptide content is high such as for tissue or cell lysates protein precipitation can be done via high organic solvents and the resulting supernatant can be analyzed for extracted peptides, where extraction solvent and conditions could have a significant effect on what endogenous peptides are extracted from tissue (
Altelaar et al., 2009). A comparative peptidomic study of human cell lines highlights the utility of finding peptide signatures as potential biomarkers (
Gelman et al., 2011). A thorough review of endogenous peptides and neuropeptides is beyond the scope of this review and an excellent review on this topic is available elsewhere (
Li and Sweedler, 2008).
Fractionation and separation
The mass spectrometer has a limited duty cycle and data dependent analysis can only scan a limited number of m/z peaks at any given time. In addition, significant ion suppression can occur if there is a difference in concentration between co-eluting peptides, or if too many peptides co-elute. Therefore, one of the biggest challenges in biomarker discovery is the complexity of the sample and the presence of high-abundance proteins in body fluids such as CSF, serum, and plasma. In addition to the removal of the most abundant proteins by immunodepletion, the reduction of the complexity of the sample by further fractionation is indispensable to facilitate the characterization of unidentified biomarkers from the low abundance proteins. Traditionally used techniques for complex protein analysis include: gel based fractionation methods such as two-dimensional gel electrophoresis (2D-GE) and its variation two-dimensional differential gel electrophoresis (2D-DIGE), or non-gel based, such as one- or multidimensional liquid chromatography (LC), and microscale separation techniques such as capillary electrophoresis (CE).
2D-GE MS has been widely used as a powerful tool to separate proteins and identify differentially expressed proteins ever since 2D gels were coupled to mass spectrometry. In 2D-GE MS thousands of proteins can be separated on a single gel according to p
I and molecular weight. Individual protein spots that show differences in abundance between different samples can then be excised from the gel, digested into peptides and analyzed by MALDI MS or by liquid chromatography tandem mass spectrometry (LC-MS/MS) for protein identification. The introduction of 2D-DIGE adds a quantitative strategy to gel electrophoresis by enabling multiple protein extracts to be separated on the same 2D gel, thus providing comparative analysis of proteomes in complex samples. In 2D-DIGE, protein extracts from two different conditions, and an internal standard can be labeled with fluorescent dyes, for example Cy3, Cy5 and Cy2 respectively prior to two-dimensional gel electrophoresis. Compared to traditional 2D-GE, 2D-DIGE provides the clear advantage of overcoming the inter-gel variation problem (
Marouga et al., 2005). Proteomic profiling of CSF by 2D-GE and 2D-DIGE has led to the identification of putative biomarkers in multiple neurological disorders. For example, Brechlin
et al. reported an optimized 2-DIGE protocol profiled CSF from 36 CJD patients. The applicability of their approach was proven by the detection of known CJD biomarkers such as 14-3-3 protein, neuron-specific enolase, lactate, dehydrogenase, and other proteins that are potentially relevant to CJD (
Brechlin et al., 2008). In another study to identify novel CSF biomarkers for multiple sclerosis, CSF from 112 multiple sclerosis patients and control individuals were analyzed by 2D-GE MS for comparative proteomics. Ten potential multiple sclerosis biomarkers were selected for validation by immunoassay (
Ottervald et al., 2010). These methodologies, sample preparation techniques, and applications of 2D-DIGE in neuroproteomics were reviewed by Diez
et al.(
Diez et al., 2010). Although 2D gel provides excellent resolving power and capability to visualize abundance changes, there are some limitations to the method. For example, gel based separation is not suitable for low abundance proteins, extremely basic or acidic proteins, very small or large proteins, and hydrophobic proteins (
Lilley et al., 2002;
Oh-Ishi and Maeda, 2002).
Complementary to gel-based approaches, shotgun proteomics coupled to LC have become increasingly popular in proteomic research because they are reproducible, highly automated, and capable of detecting low abundance proteins. Furthermore, another advantage of LC-MS shotgun proteomics is the suitability for isotope labeling for protein quantification which is reviewed in a later section. In shotgun proteomics, a protein mixture is digested and resulting peptides are separated by LC prior to tandem MS fragmentation, to identify different proteins by peptide sequencing. The most common separation for shotgun proteomics, peptidomics, or top-down proteomics experiments use low-pH reversed phase (RP) C18, C8, or C4 columns. RPLC is well established which provides high resolution, desalts the sample which can interfere with ionization, and the mobile phase is compatible with ESI. Nanoscale C18 columns allow for separation and introduction of sub microgram samples. If larger amounts of sample are available, two dimensional separations are usually preferred to greatly enhance the coverage of the investigated proteome, which will be discussed in depth later. It is preferable to have an orthogonal separation method and since RP separates via hydrophobicity, strong cation exchange (SCX) was the original choice due to its separation by charge. MudPIT (multidimensional protein identification technology) usually refers to the use of SCX as the first phase of separation and is a well-established platform (
Washburn et al., 2001). SCX has the advantage over RP separation technologies to effectively remove interfering detergents from the sample. SCX separation is not based solely off charge and hydrophobicity contributes to elution, therefore a small amount of organic modifier, usually 10-15%, is added to lessen the hydrophobicity effects (
Burke et al., 1989). The addition of organic modifiers needs to be minimized otherwise binding to the C18 trap cartridge or C18 column will be reduced if performed online. SCX can be used for PTMs and offers specific applications for proteomic studies and an excellent, current review is offered on this subject elsewhere (
Edelmann, 2011). An alternative MudPIT separation scheme employing high pH RPLC as the first phase of separation and low pH RPLC in the second dimension (RP-RP) has been successfully applied to the proteomic analysis of complex biologic samples (
Guiochon et al., 2008;
François et al., 2009). The advantage of using RP as the first dimension is the higher resolution for separation and better compatibility with down-stream MS detection by eliminating salt. Song et al. reported a phosphoproteome analysis based on this 2D RP-RP coupling scheme (
Song et al., 2010).
Hydrophilic interaction chromatography (HILIC) employs distinct separation modality where the retention of peptides is increased with increasing polarity (
Alpert, 1990). The loading of sample is done by high organic and eluted by increasing the percentage of the aqueous phase, or polarity of the mobile phase, opposite from RPLC, thus establishing orthogonality of the two separation modes (
Gilar et al., 2005). HILIC has quickly become a very useful method and is actively used for proteomic experiments (
Di Palma et al., 2011b) for increased sensitivity (
Di Palma et al., 2011a), phosphoproteomics (
Zarei et al., 2011), glycoproteins (
Neue et al., 2011), and quantification studies (
Ow et al., 2011). An alternative and modification to HILIC is ERLIC, which adds an additional mode of separation by electrostatic attraction. An earlier study using ERLIC demonstrated the ability to separate phosphopeptides from non-phosphorylated peptides at pH= 2 (
Alpert, 2008). A recent study looking into changes in the phosphoproteome of Marek’s Disease applied ERLIC to chicken embryonic fibroblast lysate identifying only 1.3% phosphopeptides out of all the identified peptides. Due to the lack of isolation of phosphopeptides from ERLIC the investigators performed immobilized metal affinity chromatography (IMAC) enrichment on the fractions increasing identification of phosphopeptides over 50 fold (
Chien et al., 2011). A comparative study of ERLIC to HILIC and SCX following TiO
2 phospho-enrichment reported that SCX>ERLIC>HILIC for phosphopeptide identifications (
Zarei et al., 2011).
Recent developments in instrumentation to combine LC with ion mobility spectrometry (IMS) and MS (LC-IMS-MS), offered more advantages than conventional LC due to the rapid, high-resolution separations of analytes based on their charge, mass and shape as reflected by mobility in a given buffer gas. The mobility of an ion in a buffer gas is determined by the ion’s charge and its collision cross-section with the buffer gas. The methodologies of IMS separations and the application of LC-IMS-MS for the proteomics analysis of complex systems, including human plasma have been reviewed by Clemmer’s group (
Liu et al., 2004b;
Valentine et al., 2005;
Valentine et al., 2006). They proposed a method that employs intrinsic amino acid size parameters to obtain ion mobility predictions which can be used to rank candidate peptide ion assignments and significantly improve peptide identification (
Valentine et al., 2011).
Although 2D gel and LC are routinely used as separation techniques in MS-based proteomics, capillary electrophoresis (CE) has received increasing attention as a promising alternative due to the fast and high-resolution separation it offers. CE has a wide variety of operation modes, among which capillary zone electrophoresis (CZE) and capillary isoelectric focusing (CIEF) have the greatest potential applications in MS-based proteomics, thus will be highlighted here. CZE separates analytes by their charge-to-size ratios in buffers under a high electrical field and is often used as the final dimension prior to MS analysis, while the separation feature of CIEF is based on isoelectric point, and this technique is more suitable to be used as the first dimension separation. Detailed description of different CE-MS interfaces, sample preconcentration and capillary coating to minimize analyte adsorption could be found in several reviews (
Simpson and Smith, 2005;
Huck et al., 2006;
Haselberg et al., 2007,
2011;
Ahmed, 2009a;
Fonslow and Yates, 2009;
Klampfl, 2009;). CE technique is complementary to conventional LC in that it is suitable for the analysis of polar and chargeable compounds. Dovichi’s group conducted proteomic analysis of the secreted protein fraction of
Mycobacterium marinum which has intermediate protein complexity (
Li et al., 2012). The tryptic digests were either analyzed by UPLC-ESI-MS/MS in triplicates or prefactionated by RPLC followed by CZE-ESI-MS/MS. It was demonstrated that the two methods identified similar numbers of peptides and proteins within similar analysis times. However, CZE-ESI-MS/MS analysis of the prefractionated sample tended to identify more peptides that are basic and have lower
m/z values than those identified by UPLC-ESI-MS/MS. This analysis also presented the largest number of protein identifications by using CE-MS/MS, suggesting the effectiveness of prefractionation of complex samples by LC method prior to CZE-ESI-MS/MS. The use of CIEF as the first dimension of separation provides both sample concentration and excellent resolving power. The combination of CIEF and RPLC separation has been applied to the proteomic analyses where the amount of protein sample is limited and cannot meet the requirement of minimal load amount for 2D LC-MS/MS (
Chen et al., 2003;
Dai et al., 2010). So far CE-MS has been widely applied to the proteomic analysis of various biologic samples such as urine (
Mischak et al., 2010;
Albalat et al., 2011b), CSF (
Zuberovic et al., 2008), blood (
Johannesson et al., 2007), frozen tissues (
Guo et al., 2006), and the formalin-fixed and paraffin-embedded (FFPE) tissue samples (
Hwang et al., 2007). The recent CE–MS applications to clinical proteomics have been summarized in several reviews (
Desiderio et al., 2010;
Ahmed, 2009a;
Albalat et al., 2011a).
Protein quantification
In 2D gel electrophoresis, the quantitative analysis of protein mixtures is performed on the gel by comparing the intensity of the protein stain. The development of 2D-DIGE eliminated the gel-to-gel variation and greatly improved the quantitative capability and reliability of 2D gel methodology (
Marouga et al., 2005). However, the accuracy of 2D gel based protein quantification suffers from the limitations that a seemingly single gel spot often contains multiple proteins and the difficulty of detecting proteins with extreme molecular weights and p
I values as well as highly hydrophobic proteins such as membrane proteins. Therefore, non-gel based shotgun proteomics technology is more suitable for accurate and large-scale protein identification and quantification in complex samples. Briefly, the quantification in non-gel based shotgun proteomics can be categorized into two major approaches: stable isotope labeling-based and label-free methods. The common strategies for quantitative proteomic analysis are reviewed and summarized in Table 1.
Isotope labeling methods
Because stable isotope-labeled peptides have the same chemical properties as their unlabeled counterparts, the two peptides within a mixture should exhibit identical behaviors in MS ionization. The mass difference introduced by isotope labeling enables the detection of a pair of two distinct peptide masses by MS within the mixture and allowing for the measurement of the relative abundance differences between two peptides. Depending on how isotopes are incorporated into the protein or peptide, these labeling methods can be divided into two groups: In vitro chemical derivatization techniques, which incorporate a label or tag into the peptide or protein during sample preparation; metabolic labeling techniques, which introduce the isotope label directly into the organism via isotope-enriched nutrients from food or media.
In vitro derivatization techniques
There are multiple methods to introduce heavy isotopes into proteins or peptides
in vitro. The commonly used strategies include
18O
/16O enzymatic labeling, Isotope-Coded Affinity Tag (ICAT), Tandem Mass Tags (TMTs), and Isobaric Tags for Relative and Absolute Quantification (iTRAQ). The
18O labeling method enzymatically cleaves the peptide bond with trypsin in the presence of
18O-enriched H
2O and introduces 4 Da mass shift in the tryptic peptides (
Ye et al., 2009). The advantages of this method include:
18O-enriched water is extremely stable; tryptic peptides will be labeled with the same mass shift; secondary reactions inherent to other chemical labeling can be avoided. Conversely, widespread use of
18O-labeling has been hindered due to the difficulty of attaining complete
18O incorporation and the lack of robustness (
Staes et al., 2004;
Jorge et al., 2009). Currently, ICAT, TMTs and iTRAQ methods are extensively used in quantitative proteomics. In ICAT, cysteine residues are specifically derivatized with a reagent containing either zero or eight deuterium atoms as well as a biotin group for affinity purification of cysteine-containing peptides (
Gygi et al., 1999;
Haqqani et al., 2008). The advantage of ICAT is that the affinity purification via biotin moiety can facilitate the detection of low-abundance cysteine-containing peptides. In addition, the mass difference introduced by labeling increases mass spectral complexity, with quantification from the different precursor masses done by MS and peptide identification being achieved through tandem MS (MS/MS). This added complexity from different peptide masses was addressed by using isobaric labeling methods such as TMTs and iTRAQ (
Thompson et al., 2003;
Ross et al., 2004) where the same peptides in different samples are isobaric after tagging and appear as single
m/z in MS scans, thus enhancing the peptide limit of detection and reducing the MS scan complexity. Isobaric labeling reagents are composed of a primary amine reactive group and an isotopic reporter group linked by an isotopic balancer group for the normalization of the total mass of the tags. The reporter group serves for quantification purpose since it is cleaved during collision-induced dissociation (CID) to yield a characteristic isotope-encoded fragment. Moreover, isobaric labeling methods allow the comparison of multiple samples within a single experiment. Recently, a 6-plex version of TMTs was reported (
Dayon et al., 2008), and iTRAQ enables up to eight samples to be labeled and relatively quantified in a single experiment (
D’Ascenzo et al., 2008). 8-plex iTRAQ reagents have been used for the comparison of complicated biologic samples such as CSF in the studies of neurodegenerative diseases (
Choe et al., 2007). Recently, our group developed a novel N, N-dimethyl leucine (DiLeu) 4-plex isobaric tandem mass (MS
2) tagging reagents with high quantitation efficacy. DiLeu has the advantage of synthetic simplicity and greatly reduced synthesis cost compared to TMTs and iTRAQ (
Xiang et al., 2010). Xiang et al. (2010) demonstrated, that DiLeu produced comparable iTRAQ ability for protein sequence coverage (~43%) and quantitation accuracy (<15%) for tryptically digested proteins. More importantly, DiLeu reagents could promote enhanced fragmentation of labeled peptides, thus allowing more confident peptide and protein identifications.
In vivo metabolic labeling
Metabolic processes can also be employed for the incorporation of stable-isotope labels into the proteins or organisms by enriching culture media or food with light or heavy versions of isotope labels (
2H,
13C,
15N). The advantage of
in vivo labeling is that metabolic labeling does not suffer from incomplete labeling which is an inherent drawback for
in vitro derivatization techniques. In addition, metabolic labeling occurs from the start of the experiment, and proteins with light or heavy labels are simultaneously extracted, thus reducing the error and variability of quantification introduced during sample preparation. The most widely used strategy for metabolic labeling is known as stable-isotope labeling of amino acids in cell culture (SILAC) which was introduced by Mann and coworkers (
Ong et al., 2002;
Ong et al., 2003). In SILAC, one cell population is grown in normal, or light, media, while the other is grown in heavy media enriched with a heavy isotope-encoded (typically
13C or
15N) amino acid, such as arginine or leucine. Cells from the two populations are then combined; proteins are extracted, digested, and analyzed by MS. The relative protein expression differences are then determined from the extracted ion chromatograms from both the light and heavy peptide forms. SILAC has been shown to be a powerful tool for the study of intracellular signal transduction. In addition, this technique has recently been applied to the quantitative analysis of phosphotyrosine (pTyr) proteomes to characterize pTyr-dependent signaling pathways (
Pimienta et al., 2009;
Zhang and Neubert, 2009).
Labe-free quantification
Although various isotope labeling methods have provided powerful tools for quantitative proteomics, several limitations of these approaches are noted. Labeling increases the cost and complexity of sample preparation, introduces potential errors during the labeling reaction. It also requires a higher sample concentration and complicates data processing and interpretation. In addition, so far only TMTs and iTRAQ allow the comparison of multiple (up to eight) samples simultaneously. The comparison of more than eight samples in a single experiment cannot be achieved by isotope labeling. To address these concerns, there has been significant interest in the development of label-free quantitative approaches. Current label-free quantification methods for MS-based proteomics were developed based on the observation that the chromatographic peak area of a peptide (
Bondarenko et al., 2002;
Chelius and Bondarenko, 2002) or frequency of MS/MS spectra (
Liu et al., 2004a) correlating to the protein or peptide concentration. Therefore, the two most common label-free quantification approaches are conducted by comparing: (i) area under the curve (AUC) of any given peptides (
Wang et al., 2003;
Silva et al., 2005) or (ii) by frequency measurements of MS/MS spectra assigned to a protein, commonly referred to as spectral counting (
Zybailov et al., 2005). Several recent reviews provided detailed and comprehensive knowledge comparing label-free methods with labeling methods, data processing and commercially available software for label-free quantitative proteomics (
Zhu et al., 2010;
Neilson et al., 2011;
Xie et al., 2011;
Filiou et al., 2012).
Dissociation techniques
The vast majority of proteomic experiments have proteins or peptides being identified by two critical pieces of data obtained from the mass spectrometer. The first is the precursor ion identified by its
m/z, which is informative to the mass of the peptide being analyzed. The second is the use of tandem mass spectrometry to fragment or dissociate the precursor ion and record the generated fragment ion pattern to discern the amino acid sequence. The three most popular dissociation or fragmentation techniques for peptides are CID, electron-transfer dissociation (ETD), and high-energy collision dissociation (HCD). A recent study on the human plasma proteome demonstrated that combined fragmentation techniques enhance coverage by providing complementary information for identifications. CID enabled the greatest number of protein identifications, while HCD identified an additional 25% proteins and ETD contributed an additional 13% protein identifications (
Shen et al., 2011).
ETD/ECD
Electron capture dissociation (ECD) (
Zubarev et al., 1998) preceded ETD, but ECD was developed for use in a Penning trap for Fourier transform ion cyclotron resonance (FTICR) mass spectrometers. ECD requires the ion of interest to be in contact with near-thermal electrons and for the electron capture event to occur on the millisecond time scale, but the time scale is inadequate for electron trapping in Paul traps or quadrupoles in the majority of mass spectrometers (
Syka et al., 2004). ETD involves a radical anion, like fluoranthene, with low electron affinity to be transferred to peptide cation, which results in more uniform cleavage along the peptide backbone. The cation accepts an electron and the newly formed odd-electron protonated peptide undergoes fragmentation by cleavage of the N-Cα bond which results in fragmentation ions consisting of c- and z•-type product ions. The uniformed cleavage results in reduced sequence discrimination to labile bonds such as PTMs and also provides improved sequencing for larger peptides compared to CID (
Xia et al., 2007). The realization that larger peptides produced better MS/MS quality spectra compared to CID led to a decision tree analysis strategy where peptide charge states and size determined whether the precursor peptide would be fragmented with CID or ETD (
Swaney et al., 2008). One of the main benefits of ETD/ECD is the ability to sequence peptides with labile PTMs such as phosphorylation (
Chi et al., 2007;
Zhou et al., 2009), sulfation (
Liu and Håkansson, 2006), glycosylation (
Snovida et al., 2010), ubiquitination (
Sobott et al., 2009), and histone modifications (
Eliuk et al., 2010). ETD also has the benefit of providing better sequence information on larger neuropeptides when compared to CID (
Hui et al., 2011). However, a thorough analysis suggested that CID still yielded more peptide/protein identifications than ETD in large scale proteoimcs (
Molina et al., 2008).
HCD
High energy collision dissociation (HCD) (
Olsen et al., 2007) is an emerging fragmentation technique that offers improved detection of small reporter ions from iTRAQ-based studies (
Dayon et al., 2010;
McAlister et al., 2010). HCD is performed at a higher energy in a collision cell instead of an ion trap like CID, thus HCD does not suffer from the low-mass cutoff limitation. Furthermore, HCD offers enhanced fragmentation efficiency, assisting in MS/MS spectra interpretation and protein identification (
Second et al., 2009). A major drawback for HCD is that the spectral acquisition times are up to twofold longer due to increased ion requirement for Fourier transform detection in the orbitrap (
Jedrychowski et al., 2011). HCD has been reported to increase phosphopeptide identifications over CID (
Nagaraj et al., 2010), but in a different study CID was reported to offer more phosphopeptide identifications over HCD (
Jedrychowski et al., 2011). Work has also been done to transfer the decision tree analysis for HCD which basically switches CID with HCD claiming better quality data determined by higher Mascot scores with more peptide identifications (
Frese et al., 2011).
MSE
Data dependent acquisition (DDA) is the most commonly used ion selection process in mass spectrometers for proteomic experiments. An alternative process which does not have ion selection nor switch between MS and MS/MS modes is termed MS
E. MS
E is a data independent mode and does not require precursor ions of a significant intensity to be selected for MS/MS analysis (
Chakraborty et al., 2007). A data independent mode decouples the mass spectrometer choosing which precursor ions to fragment and when the ions are fragmented. MS
E works by a low or high energy scan and no ion isolation is occurring. The low energy scan is where the precursor ion is not fragmented, and the high energy scan allows fragmentation. The resulting mix of precursor and fragmentation ions is then detected simultaneously (
Ramos et al., 2006). The data will then need to be deconvoluted using a proprietary, time-aligned algorithm that is discussed elsewhere (
Barbara and Castro-Perez, 2011). The continuous data independent acquisition allows multiple MS/MS spectra to be collected during the natural analyte peak broadening observed in chromatography, which provides more data points for AUC label-free quantification. In addition, lower abundance peptides can be sequenced, as more MS/MS spectra are collected throughout the elution of an LC peak, allowing better signal averaging for smaller analyte peak of interest during coelution and reducing sampling bias in typical DDA experiments where only more abundant peaks can be selected for fragmentation.
A comparison of spiked internal protein standards into a complex protein digest provided evidence that MS
E was comparable to DDA analysis in LC-MS (
Geromanos et al., 2009). MS
E has been used for label free proteomics of immunodepleted serum in large scale proteomics samples (
Koutroukides et al., 2011). In addition, MS
E was performed for the characterization of human cerebellum and primary visual cortex proteomes. Hundreds of proteins were identified, including many previously reported in neurological disorders (
Martins-de-Souza et al., 2012). MS
E is quickly becoming a versatile data acquisition method, recently used in such studies as cancer cells (
Scatena et al., 2010), schizophrenia (
Herberth et al., 2011), and pituitary proteome discovery (
Krishnamurthy et al., 2011). The usefulness of MS
E as an unbiased data acquisition method is being assimilated into multiple proteomics studies including studies involving neurological disorders.
Data analysis
One of the major bottlenecks in non-targeted proteomic experiments is how to handle the enormous amount of data obtained. Database searches, biostatistical analysis, de novo sequencing, PTM validation all have their place and multiple available platforms are available.
If the organism being studied has had its genome sequenced databases can be created with a list of proteins in the FASTA format to be used in database searching. There are numerous database searching algorithms for sequence identification of MS/MS data including Mascot (
Perkins et al., 1999), Sequest (
Eng et al., 1994), Xtandem (
Craig and Beavis, 2004), OMSSA (
Geer et al., 2004), and PEAKS (
Zhang et al., 2011b). These searching algorithms are performed by matching MS/MS spectra and precursor mass to sequences found within proteins. How well the actual spectra match the theoretical spectra determines a score, which is unique to the searching algorithm and usually can be extrapolated to the probability of a random hit. Recently, a database has been developed for PTM analysis by the use of the program SIMS (
Liu et al., 2008). Specifically for phosphopeptides, Ascore’s algorithm scans the MS/MS data to determine the likelihood of correct phosphosite identification from the presence of site identifying product ions (
Beausoleil et al., 2006). If the organism that is being analyzed has not had its genome sequenced and no (or very limited) FASTA database is available, a homology search can be performed using SPIDER (
Han et al., 2005) available with PEAKS software. Alternatively, individual MS/MS spectrum can be
de novo sequenced, but software is available to perform automated
de novo sequencing of numerous spectra (PEAKS, (
Geer et al., 2004) DeNovoX, and PepSeq).
For large-scale protein identifications, the false discovery rate (FDR) must be established by the searching algorithm, and that is accomplished by re-searching the data with a false database created by reversing or scrambling the amino acid sequence of the original database used for the protein search. Any hits from the false database will contribute to the FDR and this value can be adjusted, usually around 1%. An additional layer of confidence in the obtained data can be achieved in shotgun proteomics experiments by removing all the proteins that are identified by only one peptide.
Once a set of confident proteins or peptides have been generated from database searching, bioinformatic analysis or biostatistical analysis is needed. Numerous software packages are available for different purposes. FLEXIQuant is an example for absolute quantitation of isotopically labeled protein or peptides of interest (
Singh et al., 2009). FDR analysis of phosphopeptides or other specific PTMs can be adjusted with such software as Scaffold, providing data consisting only of a specific modification (
Searle, 2010). Bioinformatic tools, such as Scaffold or ProteoIQ, also include gene ontology (GO) analysis, which can classify identified proteins by three categories: cellular component, molecular function, or biologic process. Custom bioinformatics programs can also be developed and are often useful in various proteomic studies, including biomarker discovery in neurological diseases (
Herbst et al., 2009). More detailed review of bioinformatics in peptidomics (
Menschaert et al., 2010) and proteomics (
Kumar and Mann, 2009) can be found elsewhere.
Validation of biomarkers by targeted proteomics
The validation of putative biomarkers identified by MS-based proteomic analysis is often required to provide orthogonal analysis to rule out a false positive by MS and providing additional evidence for the biomarker candidate(s) from the study for future potential clinical assays. At present, antibody-based assays such as Western blotting, ELISA and immunochemistry are the most widely used methods for biomarker validation. Although accurate and well established, these methods rely on protein specific antibodies for the measurement of the putative biomarker and could be difficult for large-scale validation of all or even a subset of a long list of putative protein biomarkers typically obtained by MS-based comparative proteomic analysis. Large scale validation is impractical due to the cost for each antibody, the labor to develop a publishable Western blot or ELISA, and the antibody availability for certain proteins. As an alternative strategy, quantitative assays based on multiple-reaction monitoring (MRM) MS using a triple quadrupole mass spectrometer have been employed in biomarker verification.
MRM is the most common use of MS/MS for absolute quantitation. It is a hypothesis driven experiment where the peptide of interest and its subsequent fragmentation pattern must be known prior to the quantitative MRM experiments. MRM involves selecting a specific m/z (first quadrupole) to be isolated for fragmentation (second quadrupole), followed by one or more of the most intense fragment ions (third quadrupole) being monitored. The ability to quantitate and thus validate the proteins or peptides as potential biomarkers is achieved by performing MRM on isotopically labeled reference peptide for targeted peptide/protein of interest. The main obstacle for quantification of peptides is interference and ion suppression effects from co-eluting substances. Since the isotopically labeled and native peptide will co-elute, the same interference and ion suppression will occur for both peptides, and thus correcting these interfering effects.
Peptides need to be systematically chosen for a highly sensitive and reproducible MRM experiment to ensure proper validation of putative biomarkers. Peptides require certain intrinsic properties which include an
m/z within the practical mass detection range for the instrument and high ionization efficiency. If the desired peptide to be quantified is derived from a digestion, then peptides that have detectable incomplete digestion or missed cleavage site can be a major source of variability. Peptides with a methionine and to a lesser extent tryptophan are traditionally removed from consideration from MRM quantitative experiments due to the variable nature of the oxidation that can occur. In addition, if chromatographic separation is performed the retention behavior of the peptide must be well behaved with little tailing effects, eluting late causing broadening of the peak, and even irreversible binding to the column. As an example, hydrophilic peptides being eluted off a C18 column may exhibit the previously described concerns and a different chromatographic separation will need to be explored for improved limits of detection, quantitation, and validation. To determine consistent peptide detection or usefulness of certain peptides databases such as Proteomics Database (
Craig et al., 2004), PRIDE (
Jones et al., 2007), PeptideAtlas (
Deutsch et al., 2008) have been developed to compile proteomic data repositories from initial discovery experiments.
After the peptide is selected for analysis the proper MRM transitions need to be selected to optimize the sensitivity and selectivity of the experiment. It is common for investigators to select two or three of the most intense transitions for the proposed experiment. It is imperative that the same instrument is used for the determination of transition ions as different mass spectrometers may have a bias toward different fragment ions.
MRM experiments are still highly popular experiments for hypothesis directed experiments (
Miliotis et al., 2011), biomarker analysis (
Xiang and Koomen, 2012), and validation (
Ossola et al., 2011). Validation of putative biomarkers is increasingly becoming a necessary step when performing large scale non-hypothesis driven proteomics experiments. The traditional validation techniques of ELISA, Western blotting, and immunohistochemistry are still used, but MRM experiments are becoming an attractive alternative for validation of putative biomarkers due to its enhanced throughput and specificity. Current work is still being performed to both expand the linear dynamic range (
Liu et al., 2011a) and sensitivity (
Belov et al., 2011) of MRM. A recent endeavor to increase the sensitivity for MRM experiments was accomplished by “Pulsed MRM” via the use of an ion funnel trap to enhance confinement and accumulation of ions. The authors claimed an increase by 5-fold for peak amplitude and a 2-3 fold reduction in chemical background (
Belov et al., 2011).
Remaining challenges and emerging technologies
Large sample numbers for mass spectrometry analysis
Multiple conventional studies in proteomics have been performed on a single or a few biologic samples. As bio-variability can be exceedingly high, the need for larger sample sizes is currently being investigated. Prentice et al. (
2010) used a starting point of 3,200 patient samples from the Women’s Health Institute (WHI) to probe the plasma proteome using MS for biomarkers. The study did not test the 3,200 patient samples by MS because even a simple one hour one dimensional RP analysis on a mass spectrometer would take months of instrument time for uninterrupted analysis. Instead, the authors pooled 100 samples together to bring the total number of pooled samples to 32. To provide relevant plasma biomarkers the samples were then subjected to immunodepletion, 2-D protein separation (96 fractions total), and then 1-D RPLC of tryptic peptide separation online interface to a mass spectrometer. The large sample cohorts help address bio-variability that can be a concern from small sample size proteomic experiments and provide ample sample amounts to investigate the low abundance proteins (
Prentice et al., 2010).
Hemoglobin-derived neuropeptides and non-classical neuropeptides
Neuropeptides, such as neuropeptide Y and enkephalin, are short chains of amino acids that are secreted from a range of neuronal cells that signal nearby cells. In contrast, non-classical neuropeptides are termed as neuropeptides or “microproteins” which are derived from intracellular protein fragments and synthesized from the cytosol (
Gelman and Fricker, 2010). MS was recently used to determine that hemopressins, which are hemoglobin-derived peptides, are upregulated in Cpe
fat/fat mice brains. Gelman
et. al. designed an MS experiment to compare hemoglobin-derived peptides, comparing the brain, blood, and heart peptidome in mice. The authors provided data that specific hemoglobin peptides were produced in the brain and were not produced in the blood. Certain alpha and beta hemoglobin peptides were also upregulated in the brain for Cpe
fat/fat mice and bind to CB1 cannabinoid receptors (
Gelman et al., 2010). As discussed earlier in the review, peptidomics and specifically neuropeptidomics are popular fields of study utilizing MS and non-classical neuropeptides is an exciting, emerging area of research that could further expand the diversity of cell-cell signaling molecules.
Ultrasensitive mass spectrometry for single cell analysis
In addition to large scale analysis, MS-based proteomics and peptidomics are making progress into ultrasensitive single cell analysis. The most successful MS-based techniques for single cell analysis was performed with MALDI, and studies that have been performed on relatively large neurons are reviewed elsewhere (
Li et al., 2000). The ultrasensitive MS analysis is currently directed toward single cell analysis of smaller cells including cancer cells. The first challenge in single cell analysis is the isolation and further sample preparation to yield relevant data. Collection and isolation of a cell type can be accomplished using antibodies for fluorescence activated cell sorting (FACS) and immune magnetic separation. FACS works by flow cytometry sorting cells by a laser that excites a fluorescent tag that is attached to an antibody. Immune magnetic separation allows separation by antibodies with magnetic properties such as Dynabeads (Altelaar and Heck). One exciting study combining FACS and MS termed mass cytometry. This technology works by infusing a droplet into an inductively coupled plasma mass spectrometer (ICP-MS) containing a single cell bound to antibodies chelated to transition elements allowing a quantifying response between single cells (
Bandura et al., 2009). Clearly, the future of single cell analysis for biomarker analysis and proteomics is encouraging and has the potential to be an emerging field in MS-based proteomics and peptidomics.
Laserspray ionization (LSI)
Laserspray ionization (LSI) is an exciting new method to produce multiply charged mass spectra from MALDI that is nearly identical to ESI (
McEwen et al., 2010;
Trimpin et al., 2010;
Wang et al., 2011a). Recently, it has been reported that LSI can be performed in lieu of matrix to produce a total solvent-free analysis (
Wang et al., 2011a). The benefits of being able to generate multiply charged peptides without any solvent may offer advantages including MS analysis of insoluble membrane proteins or hydrophobic peptides, avoidance of chemical reactions while in solvents, a reduction of sample loss due to liquid sample preparation, and ability to avoid diffusion effects from tissue imaging studies (
Wang et al., 2011b).
The multiply charged peptide and protein ions produced by LSI expand the mass range for tissue imaging analysis. More importantly, the multiply-charged peptide ions are amenable for electron-based fragmentation methods such as ETD or ECD, which can be employed in conjunction with tissue imaging experiments to yield
in situ sequencing and identification of peptides of interest (
Inutan et al., 2010).
Paper spray ionization
Paper spray (PS) is an ambient ionization method which was first reported using chromatography paper allowing detection of metabolites from dried blood spots. The original method used a cut out piece of paper with a voltage clipped on the back while applying 10 µL of methanol/H
2O (
Wang et al., 2010). Improvements have been made to this technology to enhance analysis efficiency with a new solvent 9:1 dichloromethane/isopropanol (v/v) and the use of silica paper over chromatography paper (
Zhang et al., 2012). Interesting applications or modifications have been made to PS including direct analysis of biologic tissue (
Wang et al., 2010) and leaf spray for direct analysis of plant materials (
Liu et al., 2011b ), but both detect metabolites instead of proteins or peptides. Paper spray ionization was previously shown to enable detection of cytochrome c and bradykinin [2-9] standards, in a proof of principle study (
Liu et al., 2010). Clearly, the utility of PS analysis in proteomics and peptidomics is yet to be explored.
niECD
New fragmentation techniques have been investigated for their utility in proteomics and peptidomics, including a recently reported negative-ion electron capture dissociation (niECD). Acidic peptides which usually contain PTMs such as phosphorylation or sulfonation are often difficult to be detected as multiply charged peptides in the positive ion mode. As discussed earlier multiply charged peptides are required for ECD/ETD fragmentation. The fragmentation of niECD is accomplished by a multiply negatively charged peptide adding an electron. The resulting fragmentation of multiply sulfated and phosphorylated peptide and protein standards showed no sulfate loss and preserved phosphorylation site. The resulting fragmentation pattern from niECD was also improved in the peptide anions and provides a new strategy for
de novo sequencing with PTM localization (
Yoo et al., 2011).
Conclusions and perspectives
Proteomics methodologies have produced large data sets of proteins involved in various biologic and disease progression processes. Numerous mass spectrometry-based proteomics and peptidomics tools have been developed and are continuously being improved, in both chromatographic or electrophoretic separation and MS hardware and software. However, several important issues that remain to be addressed rely on further technical advances in proteomics analysis. When large proteomes consisting of thousands of proteins are analyzed and quantified, dynamic range is still limited with more abundant proteins being preferentially detected. Development and optimization of chemical tagging reagents that target specific protein classes maybe necessary to help enrich important signaling proteins and assess cellular and molecular heterogeneity of the proteome and peptidome. Furthermore, a significant bottleneck in usefulness of proteomics research is the ability to validate the results and provide clear significant biologic relevance to the results. The idea of P4 medicine (
Hood and Friend, 2011;
Tian et al., 2012) is an attractive concept where the four P’s stand for predictive, preventive, personalized, and participatory. Proteomics is one of the critical “omics” fields and has led to the development of enabling innovative strategies to P4 medicine (
Tian et al., 2012). A goal of P4 medicine is to assess both early disease detection and disease progression in a person. A simplified example of how proteomics fits into P4 medicine is that certain brain-specific proteins could be used for diagnosis with presymptomatic prion disease (
Tian et al., 2012). The concept of proteomic experiments providing an individual biomarker is becoming more obsolete, with the revised vision being a biomolecular barcode that, could potentially be “scanned” or be a fingerprint for a specific disease or early onset to that disease being closer to reality. An excellent review on what biomarker analysis can do for true patients is available (
Belda-Iniesta et al., 2011).
Proteomics can also generate new hypothesis that can be tested by classical biochemical approaches. If a disease has an unknown pathogenesis, proteomics is a good starting point to try to assemble putative markers that can lead to further hypothesis for evaluation. If a particular protein or PTM is associated with a disease state either qualitatively or quantitatively, potential treatments could target that protein of interest, or investigators could monitor that protein or PTM during potential treatments of the disease. Proteomics has expanded greatly over the last few decades, with the goal of providing revealing insights to some of the most complex biologic problems currently facing the scientific community.
Higher Education Press and Springer-Verlag Berlin Heidelberg