Accelerating the cryo-EM structure determination in RELION on GPU cluster
Xin YOU , Hailong YANG , Zhongzhi LUAN , Depei QIAN
Front. Comput. Sci. ›› 2022, Vol. 16 ›› Issue (3) : 163102
Accelerating the cryo-EM structure determination in RELION on GPU cluster
The cryo-electron microscopy (cryo-EM) is one of the most powerful technologies available today for structural biology. The RELION (Regularized Likelihood Optimization) implements a Bayesian algorithm for cryo-EM structure determination, which is one of the most widely used software in this field. Many researchers have devoted effort to improve the performance of RELION to satisfy the analysis for the ever-increasing volume of datasets. In this paper, we focus on performance analysis of the most time-consuming computation steps in RELION and identify their performance bottlenecks for specific optimizations. We propose several performance optimization strategies to improve the overall performance of RELION, including optimization of expectation step, parallelization of maximization step, accelerating the computation of symmetries, and memory affinity optimization. The experiment results show that our proposed optimizations achieve significant speedups of RELION across representative datasets. In addition, we perform roofline model analysis to understand the effectiveness of our optimizations.
cryo-EM structure determination / performance optimization / GPU acceleration / RELION
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
Su H, Wen W, Du X, Lu X, Liao M, et al. Gerelion: Gpu-enhanced parallel implementation of single particle cryo-EM image processing. bioRxiv, 2016, 075887 |
| [13] |
|
| [14] |
You X, Yang H, Luan Z, Qian D. Performance analysis and optimization of cyro-em structure determination in relion-2. In: Proceedings of Conference on Advanced Computer Architecture. 2018: 195–209 |
| [15] |
Relion version 2.1 stable, 2017. |
| [16] |
|
| [17] |
Wang K, Xu S, Yu H, Fu H, Yang G. GPU-based 3d cryo-EM reconstruction with key-value streams: poster. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming. 2019: 421–422 |
| [18] |
Wang W, Duan B, Tang W, Zhang C, Tang G, Zhang P, Sun N. A coarse-grained stream architecture for cryo-electron microscopy images 3d reconstruction. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. 2012: 143–152 |
| [19] |
|
| [20] |
|
| [21] |
Reinders J. VTune (TM) Performance Analyzer Essentials: Measurement and Tuning Techniques for Software Developers. 1st ed. California: Intel Press, 2004. |
| [22] |
|
| [23] |
CUDA Nvidia. Cufft library, 2010. |
| [24] |
|
| [25] |
Hursey J, Mallove E, Squyres J M, Lumsdaine A. An extensible framework for distributed testing of mpi implementations. In: Proceedings of Euro PVM/MPI. 2007. |
| [26] |
|
| [27] |
NVIDIA. Nvidia tesla v100 performance, 2019. |
| [28] |
Intel. Intel® xeon® gold 6148 processor, 2019. |
| [29] |
Sodani A. Knights landing (knl): 2nd generation intel® xeon phi processor. In: Proceedings of 2015 IEEE Hot Chips 27 Symposium (HCS). 2015: 1–24 |
| [30] |
David H, Gorbatoy E, Hanebutte U R, Khanna R, Le C. Rapl: memory power estimation and capping. In: Proceedings of 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED). 2010: 189–194 |
Higher Education Press
/
| 〈 |
|
〉 |