INTRODUCTION
EXPERIMENTAL SECTION
Workflow of MS2pep
MS2 spectrum self-demultiplexing
Large precursor mass tolerance database searching
Search data refinement
FDR estimation
Putative modification analysis
Precursor signal filtering
Modification score calculation
Precursor-fragment correlation
Rank of candidates
Phosphorylation site analysis
DIA dataset and protein sequence databases
Parameters setting of library-free tools
Protein quantification analysis
Library generation
Quantification with EncyclopeDIA
Bioinformatics analysis
Data and code
RESULTS AND DISCUSSION
Framework of DIA-MS2pep
1 Framework of DIA-MS2pep. A DIA-MS2pep iteratively generates the pseudo-spectra from DIA data by spectrum self-demultiplexing using MS2 data only. B The pseudo-spectra are assigned with the center m/z of the isolation window and searched with the DDA search engine using a large precursor mass tolerance strategy. With rigorous data refinement, including verification precursor evidence, searching for modified forms and computation of auxiliary peptide scores, all the target and decoy peptide hits are submitted to the Percolator to estimate the false discovery rate and report peptide and protein results with a q-value < 0.01 |
The rationale of spectrum self-demultiplexing
Performance evaluation of spectrum self-demultiplexing
2 Performance evaluation of spectrum self-demultiplexing. A The comparison of the number of unique peptides identified from the HeLa_DIA dataset using DIA-MS2pep and DIA-Umpire with 1.0% of FDR estimated using either PeptideProphet or Percolator. B The fractions of matched fragments in DDA spectra, DIA spectra and pseudo-spectra generated by DIA-MS2pep and DIA-Umpire, are calculated as the longest peptide sequence covered by consecutive b- or y-ions divided by the peptide length. The peptide ions for violin plotting are identified from DDA data and pseudo-spectra generated DIA-MS2pep and DIA-Umpire in common (n = 8556). C The identification rate as a function of cosine similarity of pseudo-spectra generated by DIA-MS2pep and DIA-Umpire from one given DIA spectrum |
Peptide identification with large precursor mass tolerance database search
Performance evaluation of DIA-MS2pep on GPF DIA data
3 The valuation of DIA-MS2pep using HeLa_GPF_DIA and PhosphopPep_DIA dataset. A The unique peptide number identified from the HeLa_GPF_DIA dataset by DIA-MS2pep, DIA-Umpire and PECAN. B The unique peptide numbers reported by DIA-MS2pep, DIA-Umpire and PECAN against four species databases (H. sapiens, C. elegans, S. cerevisiae and E. coli). The percentage of peptides not from H. sapiens is labelled (red). C The percentages of the decrease in peptide numbers reported by DIA-MS2pep, DIA-Umpire and PECAN search against the four species databases relative to that against the H. sapiens database only. D The number of correctly localized phosphopeptides (200 synthetic peptides in total) identified from a diluted yeast background (PhosphopPep_DIA dataset) (Bekker-Jensenet al. 2020a) |
Identifying the peptides with PTMs from DIA data
4 Comprehensive analysis of Plasma_GPF_DIA dataset. A The unique peptide number identified by DIA-MS2pep, DIA-Umpire and PECAN from the Plasma_GPF_DIA dataset. B Fifteen glycated peptides (Hex[K]) were identified by DIA-MS2pep from the Plasma_GPF_DIA dataset. The sites reported in the UniProt database are marked as “Y”; otherwise, the sites are marked as “N” (red). PTMScores, including site probability as indicated in parentheses, are calculated by DIA-MS2pep to evaluate the site localization confidence. C,D An example of the DIA-MS2pep pseudo-spectra from Plasma_GPF_DIA dataset (Panel C) vs DDA spectra (Panel D) from the sample of in vitro glycation experiment (supplementary Methods). In the spectra, b- and y-ions are denoted using purple and blue colors, respectively. In addition, the neutral loss peaks of glycation (H6O3, −54 Da) are also denoted with b* and y* ions |
Building spectral library from DIA data with DIA-MS2pep
Application of DIA-MS2pep to real biological DIA data
5 Spectral library built directly from DIA data. Quantitative analysis of the HeLa_Serum_DIA dataset using five different spectral libraries built from either DIA data (DIA-MS2pep_Lib), GPF DIA data plus DIA data (DIA-MS2pep_GPF_Lib) or DDA data (DDA_Lib). A The violin combined with a box plot shows the distribution of coefficient of variation (CV) for quantified proteins by different spectral libraries. All box plots indicate the median and IQR, and the whiskers show the 25% and 75% percentiles. The medians of CV are indicated. B The number of quantified and differentially expressed (DE) proteins over time with an FDR < 0.01 are reported by edgeR (Lund et al. 2012). C Heat map of Reactome pathway enrichment analysis using the differentially expressed proteome from the HeLa_Serum_DIA dataset. Pathways with a p-value < 0.01 are indicated by asterisks. The Stat. mean values represent the average magnitude and direction of fold changes (the experiment with serum starvation at 0 h was set as the control) at the gene set level of upregulation (red) and downregulation (blue) |