Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics

Qingzu He , Xiang Li , Jinjin Zhong , Gen Yang , Jiahuai Han , Jianwei Shuai

Smart Medicine ›› 2024, Vol. 3 ›› Issue (3) : e20240014

PDF
Smart Medicine ›› 2024, Vol. 3 ›› Issue (3) : e20240014 DOI: 10.1002/SMMD.20240014
RESEARCH ARTICLE

Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics

Author information +
History +
PDF

Abstract

Peptide spectrum matching is the process of linking mass spectrometry data with peptide sequences. An experimental spectrum can match thousands of candidate peptides with variable modifications leading to an exponential increase in candidates. Completing the search within a limited time is a key challenge. Traditional searches expedite the process by restricting peptide mass errors and variable modifications, but this limits interpretive capability. To address this challenge, we propose Dear-PSM, a peptide search engine that supports full database searching. Dear-PSM does not restrict peptide mass errors, matching each spectrum to all peptides in the database and increasing the number of variable modifications per peptide from the conventional 3–20. Leveraging inverted index technology, Dear-PSM creates a high-performance index table of experimental spectra and utilizes deep learning algorithms for peptide validation. Through these techniques, Dear-PSM achieves a speed breakthrough 7 times faster than mainstream search engines on a regular desktop computer, with a remarkable 240-fold reduction in memory consumption. Benchmark test results demonstrate that Dear-PSM, in full database search mode, can reproduce over 90% of the results obtained by mainstream search engines when handling complex mass spectrometry data collected from different species using various instruments. Furthermore, it uncovers a substantial number of new peptides and proteins. Dear-PSM has been publicly released on the GitHub repository https://github.com/jianweishuai/Dear-PSM.

Keywords

deep learning / inverted index / mass spectrometry / peptide search / proteomics

Cite this article

Download citation ▾
Qingzu He, Xiang Li, Jinjin Zhong, Gen Yang, Jiahuai Han, Jianwei Shuai. Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics. Smart Medicine, 2024, 3(3): e20240014 DOI:10.1002/SMMD.20240014

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

A. Doerr, Nat. Methods 2013, 10, 23.

[2]

J. Griss, Y. Perez-Riverol, S. Lewis, D. L. Tabb, J. A. Dianes, N. del-Toro, M. Rurik, M. Walzer, O. Kohlbacher, H. Hermjakob, R. Wang, J. A. Vizcaíno, Nat. Methods 2016, 13, 651.

[3]

S. R. Shuken, J. Proteome Res. 2023, 22, 2151.

[4]

V. Demichev, L. Szyrwiel, F. Yu, G. C. Teo, G. Rosenberger, A. Niewienda, D. Ludwig, J. Decker, S. Kaspar-Schoenefeld, K. S. Lilley, M. Mülleder, A. I. Nesvizhskii, M. Ralser, Nat. Commun. 2022, 13, 3944.

[5]

Y. Xu, X. Liu, X. Cao, C. Huang, E. Liu, S. Qian, X. Liu, Y. Wu, F. Dong, C. Qiu, J. Qiu, K. Hua, W. Su, J. Wu, H. Xu, Y. Han, C. Fu, Z. Yin, M. Liu, R. Roepman, S. Dietmann, M. Virta, F. Kengara, Z. Zhang, L. Zhang, T. Zhao, J. Dai, J. Yang, L. Lan, M. Luo, Z. Liu, T. An, B. Zhang, X. He, S. Cong, X. Liu, W. Zhang, J. P. Lewis, J. M. Tiedje, Q. Wang, Z. An, F. Wang, L. Zhang, T. Huang, C. Lu, Z. Cai, F. Wang, J. Zhang, Innovation 2021, 2, 100179.

[6]

C. Zhao, L. Guo, J. Dong, Z. Cai, Innovation 2021, 2, 100151.

[7]

T. T. Jiang, L. Fang, K. Wang, Innovation 2023, 4, 100487.

[8]

Q. He, C. Zhong, X. Li, H. Guo, Y. Li, M. Gao, R. Yu, X. Liu, F. Zhang, D. Guo, F. Ye, T. Guo, J. Shuai, J. Han, Research 2023, 6, 0179.

[9]

Q. He, H. Guo, Y. Li, G. He, X. Li, J. Shuai, Interdiscip. Sci. Comput. Life Sci. 2024.

[10]

Y. Li, Q. He, H. Guo, C. Zhong, X. Li, Y. Li, J. Han, J. Shuai, J. Proteomics 2022, 259, 104542.

[11]

Y. Li, Q. He, H. Guo, S. C. Shuai, J. Cheng, L. Liu, J. Shuai, J. Proteome Res. 2024, 23, 834.

[12]

M. D. M. Santos, D. B. Lima, J. S. G. Fischer, M. A. Clasen, L. U. Kurt, A. C. Camillo-Andrade, L. C. Monteiro, P. F. de Aquino, A. G. C. Neves-Ferreira, R. H. Valente, M. R. O. Trugilho, G. V. F. Brunoro, T. A. C. B. Souza, R. M. Santos, M. Batista, F. C. Gozzo, R. Durán, J. R. Yates, III, V. C. Barbosa, P. C. Carvalho, Nat. Protoc. 2022, 17, 1553.

[13]

J. M. Chick, D. Kolippakkam, D. P. Nusinow, B. Zhai, R. Rad, E. L. Huttlin, S. P. Gygi, Nat. Biotechnol. 2015, 33, 743.

[14]

J. K. Eng, T. A. Jahan, M. R. Hoopmann, Proteomics 2013, 13, 22.

[15]

J. K. Eng, E. W. Deutsch, Proteomics 2020, 20, 1900362.

[16]

R. Craig, R. C. Beavis, Bioinformatics 2004, 20, 1466.

[17]

S. Kim, P. A. Pevzner, Nat. Commun. 2014, 5, 5277.

[18]

J. Cox, N. Neuhauser, A. Michalski, R. A. Scheltema, J. V. Olsen, M. Mann, J. Proteome Res. 2011, 10, 1794.

[19]

J. Cox, M. Mann, Nat. Biotechnol. 2008, 26, 1367.

[20]

D. L. Tabb, C. G. Fernando, M. C. Chambers, J. Proteome Res. 2007, 6, 654.

[21]

L. Y. Geer, S. P. Markey, J. A. Kowalak, L. Wagner, M. Xu, D. M. Maynard, X. Yang, W. Shi, S. H. Bryant, J. Proteome Res. 2004, 3, 958.

[22]

F. Yu, G. C. Teo, A. T. Kong, S. E. Haynes, D. M. Avtonomov, D. J. Geiszler, A. I. Nesvizhskii, Nat. Commun. 2020, 11, 4065.

[23]

A. T. Kong, F. V. Leprevost, D. M. Avtonomov, D. Mellacheruvu, A. I. Nesvizhskii, Nat. Methods 2017, 14, 513.

[24]

M. R. Lazear, J. Proteome Res. 2023, 22, 3652.

[25]

H. Chi, C. Liu, H. Yang, W. Zeng, L. Wu, W. Zhou, R. Wang, X. Niu, Y. Ding, Y. Zhang, Z. Wang, Z. Chen, R. Sun, T. Liu, G. Tan, M. Dong, P. Xu, P. Zhang, S. He, Nat. Biotechnol. 2018, 36, 1059.

[26]

A. Devabhaktuni, S. Lin, L. Zhang, K. Swaminathan, C. G. Gonzalez, N. Olsson, S. M. Pearlman, K. Rawson, J. E. Elias, Nat. Biotechnol. 2019, 37, 469.

[27]

S. K. Solntsev, M. R. Shortreed, B. L. Frey, L. M. Smith, J. Proteome Res. 2018, 17, 1844.

[28]

L. Käll, J. D. Canterbury, J. Weston, W. S. Noble, M. J. MacCoss, Nat. Methods 2007, 4, 923.

[29]

M. The, M. J. MacCoss, W. S. Noble, L. Käll, J. Am. Soc. Mass Spectrom. 2016, 27, 1719.

[30]

B. Van Puyvelde, S. Daled, S. Willems, R. Gabriels, A. Gonzalez de Peredo, K. Chaoui, E. Mouton-Barbosa, D. Bouyssié, K. Boonen, C. J. Hughes, L. A. Gethings, Y. Perez-Riverol, N. Bloomfield, S. Tate, O. Schiltz, L. Martens, D. Deforce, M. Dhaenens, Sci. Data 2022, 9, 126.

[31]

A. Chang, M. Leutert, R. A. Rodriguez-Mias, J. Villén, J. Proteome Res. 2023, 22, 1868.

[32]

A. A. Klammer, X. Yi, M. J. MacCoss, W. S. Noble, Anal. Chem. 2007, 79, 6111.

[33]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Banhoucke, A. Rabinovich, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Boston, MA 2015, pp. 1-9.

[34]

A. Lex, N. Gehlenborg, Nat. Methods 2014, 11, 779.

[35]

J. G. Pérez-Silva, M. Araujo-Voces, V. Quesada, Bioinformatics 2018, 34, 2322.

[36]

M. Wang, Y. Zhao, B. Zhang, Sci. Rep. 2015, 5, 16923.

[37]

E. W. Deutsch, N. Bandeira, V. Sharma, Y. Perez-Riverol, J. J. Carver, D. J. Kundu, D. García-Seisdedos, A. F. Jarnuczak, S. Hewapathirana, B. S. Pullman, J. Wertz, Z. Sun, S. Kawano, S. Okuda, Y. Watanabe, H. Hermjakob, B. MacLean, M. J. MacCoss, Y. Zhu, Y. Ishihama, J. A. Vizcaíno, Nucleic Acids Res. 2020, 48, D1145.

[38]

T. Chen, J. Ma, Y. Liu, Z. Chen, N. Xiao, Y. Lu, Y. Fu, C. Yang, M. Li, S. Wu, X. Wang, D. Li, F. He, H. Hermjakob, Y. Zhu, Nucleic Acids Res. 2022, 50, D1522.

RIGHTS & PERMISSIONS

2024 The Author(s). Smart Medicine published by Wiley-VCH GmbH on behalf of Wenzhou Institute, University of Chinese Academy of Sciences.

AI Summary AI Mindmap
PDF

155

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/