Accelerating the cryo-EM structure determination in RELION on GPU cluster
Xin YOU, Hailong YANG, Zhongzhi LUAN, Depei QIAN
Accelerating the cryo-EM structure determination in RELION on GPU cluster
The cryo-electron microscopy (cryo-EM) is one of the most powerful technologies available today for structural biology. The RELION (Regularized Likelihood Optimization) implements a Bayesian algorithm for cryo-EM structure determination, which is one of the most widely used software in this field. Many researchers have devoted effort to improve the performance of RELION to satisfy the analysis for the ever-increasing volume of datasets. In this paper, we focus on performance analysis of the most time-consuming computation steps in RELION and identify their performance bottlenecks for specific optimizations. We propose several performance optimization strategies to improve the overall performance of RELION, including optimization of expectation step, parallelization of maximization step, accelerating the computation of symmetries, and memory affinity optimization. The experiment results show that our proposed optimizations achieve significant speedups of RELION across representative datasets. In addition, we perform roofline model analysis to understand the effectiveness of our optimizations.
cryo-EM structure determination / performance optimization / GPU acceleration / RELION
[1] |
Frank J , Shimkin B , Dowse H . Spider—a modular software system for electron image processing. Ultramicroscopy, 1981, 6( 4): 343– 357
|
[2] |
Grigorieff N . Frealign: high-resolution refinement of single particle structures. Journal of Structural Biology, 2007, 157( 1): 117– 125
|
[3] |
Tang G , Peng L , Baldwin P R , Mann D S , Jiang W , Rees I , Ludtke S J . Eman2: an extensible image processing suite for electron microscopy. Journal of Structural Biology, 2007, 157( 1): 38– 46
|
[4] |
Elmlund D , Elmlund H . Simple: software for ab initio reconstruction of heterogeneous single-particles. Journal of Structural Biology, 2012, 180( 3): 420– 427
|
[5] |
Schrers S HW . Relion: implementation of a bayesian approach to cryo-em structure determination. Journal of Structural Biology, 2012, 180( 3): 519– 530
|
[6] |
Punjani A , Rubinstein J L , Fleet D J , Brubaker M A . Cryosparc: algorithms for rapid unsupervised cryo-EM structure determination. Nature Methods, 2017, 14( 3): 290–
|
[7] |
Hu M , Yu H , Gu K , Wang Z , Ruan H . A particle-filter framework for robust cryo-em 3d reconstruction. Nature Methods, 2018, 15( 12): 1083–
|
[8] |
Khoshouer M , Radjainia M , Baumeister W , Danev R . Cryo-em structure of haemoglobin at 3.2 å determined with the volta phase plate. Nature Communications, 2017, 8
|
[9] |
Paulino C , Kalienkova V , Lam A KM , Neldner Y , Dutzler R . Activation mechanism of the calciumactivated chloride channel tmem16a revealed by cryo-EM. Nature, 2017, 552( 7685): 421–
|
[10] |
Bai X , Yan C , Yang G , Lu P , Ma D . An atomic structure of human γ-secretase. Nature, 2015, 525( 7568): 212–
|
[11] |
Fernandez-Leiro R , Scheres S HW . A pipeline approach to single-particle processing in relion. Acta Crystallographica Section D: Structural Biology, 2017, 73( 6): 496– 502
|
[12] |
Su H, Wen W, Du X, Lu X, Liao M, et al. Gerelion: Gpu-enhanced parallel implementation of single particle cryo-EM image processing. bioRxiv, 2016, 075887
|
[13] |
Kimanius D , Forsberg B O , Scheres S HW , Lindahl E . Accelerated cryo-EM structure determination with parallelisation using GPUs in relion-2. Elife, 2016, 5
|
[14] |
You X, Yang H, Luan Z, Qian D. Performance analysis and optimization of cyro-em structure determination in relion-2. In: Proceedings of Conference on Advanced Computer Architecture. 2018: 195–209
|
[15] |
Relion version 2.1 stable, 2017.
|
[16] |
Li X , Grigorieff N , Cheng Y . GPU-enabled frealign: accelerating single particle 3d reconstruction and refinement in fourier space on graphics processors. Journal of Structural Biology, 2010, 172( 3): 407– 412
|
[17] |
Wang K, Xu S, Yu H, Fu H, Yang G. GPU-based 3d cryo-EM reconstruction with key-value streams: poster. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 2019: 421–422
|
[18] |
Wang W, Duan B, Tang W, Zhang C, Tang G, Zhang P, Sun N. A coarse-grained stream architecture for cryo-electron microscopy images 3d reconstruction. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 2012: 143–152
|
[19] |
Zivanov J , Nakane T , Forsberg B O , Kimanius D , Hagen W JH , Lindahle E , Scheres S HW . New tools for automated high-resolution cryo-EM structure determination in relion-3. Elife, 2018, 7
|
[20] |
Pipe J G , Menon P . Sampling density compensation in mri: rationale and an iterative numerical solution. Magnetic Resonance in Medicine, 1999, 41( 1): 179– 186
|
[21] |
Reinders J. VTune (TM) Performance Analyzer Essentials: Measurement and Tuning Techniques for Software Developers. 1st ed. California: Intel Press, 2004.
|
[22] |
Wang E, Zhang Q, Shen B, Zhang G, Lu X, Wu Q, Wang Y. High-Performance Computing on the Intel® Xeon PhiTM. 1st ed . New York: Springer, 2014: 167– 188.
|
[23] |
CUDA Nvidia. Cufft library, 2010.
|
[24] |
Frigo M , Johnson S G . Fftw user’s manual. Massachusetts Institute of Technology, 1999,
|
[25] |
Hursey J, Mallove E, Squyres J M, Lumsdaine A. An extensible framework for distributed testing of mpi implementations. In: Proceedings of Euro PVM/MPI. 2007.
|
[26] |
Williams S , Waterman A , Patterson D . Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM, 2009, 52( 4): 65– 76
|
[27] |
NVIDIA. Nvidia tesla v100 performance, 2019.
|
[28] |
Intel. Intel® xeon® gold 6148 processor, 2019.
|
[29] |
Sodani A. Knights landing (knl): 2nd generation intel® xeon phi processor. In: Proceedings of 2015 IEEE Hot Chips 27 Symposium (HCS). 2015: 1–24
|
[30] |
David H, Gorbatoy E, Hanebutte U R, Khanna R, Le C. Rapl: memory power estimation and capping. In: Proceedings of 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED). 2010: 189–194
|
/
〈 | 〉 |