Storagewall for exascale supercomputing
Wei HU, Guang-ming LIU, Qiong LI, Yan-huang JIANG, Gui-lin CAI
Storagewall for exascale supercomputing
The mismatch between compute performance and I/O performance has long been a stumbling block as supercomputers evolve from petaflops to exaflops. Currently, many parallel applications are I/O intensive, and their overall running times are typically limited by I/O performance. To quantify the I/O performance bottleneck and highlight the significance of achieving scalable performance in peta/exascale supercomputing, in this paper, we introduce for the first time a formal definition of the ‘storage wall’ from the perspective of parallel application scalability. We quantify the effects of the storage bottleneck by providing a storage-bounded speedup, defining the storage wall quantitatively, presenting existence theorems for the storage wall, and classifying the system architectures depending on I/O performance variation. We analyze and extrapolate the existence of the storage wall by experiments on Tianhe-1A and case studies on Jaguar. These results provide insights on how to alleviate the storage wall bottleneck in system design and achieve hardware/software optimizations in peta/exascale supercomputing.
Storage-bounded speedup / Storage wall / High performance computing / Exascale computing
[1] |
Agarwal, S., Garg, R., Gupta, M.S.,
|
[2] |
Agerwala, T., 2010. Exascale computing: the challenges and opportunities in the next decade. IEEE 16th Int. Symp. on High Performance Computer Architecture. http://dx.doi.org/10.1109/HPCA.2010.5416662
|
[3] |
Alam, S.R., Kuehn, J.A., Barrett, R.F.,
|
[4] |
Ali, N., Carns, P.H., Iskra, K.,
|
[5] |
Amdahl, G.M., 1967. Validity of the single processor approach to achieving large scale computing capabilities. Proc. Spring Joint Computer Conf., p.483–485. http://dx.doi.org/10.1145/1465482.1465560
|
[6] |
Bent, J., Gibson, G., Grider, G.,
|
[7] |
Cappello, F., Geist, A., Gropp, B.,
|
[8] |
Carns, P., Harms, K., Allcock, W.,
|
[9] |
Chen, J., Tang, Y.H., Dong, Y.,
|
[10] |
Culler, D.E., Singh, J.P., Gupta, A., 1998. Parallel Computer Architecture: a Hardware/Software Approach. Morgan Kaufmann Publishers Inc., San Francisco, USA.
|
[11] |
Egwutuoha, I.P., Levy, D., Selic, B.,
|
[12] |
Elnozahy, E.N., Plank, J.S., 2004. Checkpointing for peta-scale systems: a look into the future of practical rollback-recovery. IEEE Trans. Depend. Secur. Comput., 1(2):97–108. http://dx.doi.org/10.1109/TDSC.2004.15
|
[13] |
Elnozahy, E.N., Alvisi, L., Wang, Y.M.,
|
[14] |
Fahey, M., Larkin, J., Adams, J., 2008. I/O performance on a massively parallel cray XT3/XT4. IEEE Int. Symp. on Parallel and Distributed Processing, p.1–12. http://dx.doi.org/10.1109/IPDPS.2008.4536270
|
[15] |
Ferreira, K.B., Riesen, R., Bridges, P.,
|
[16] |
Frasca, M., Prabhakar, R., Raghavan, P.,
|
[17] |
Gamblin, T., de Supinski, B.R., Schulz, M.,
|
[18] |
Gustafson, J.L., 1988. Reevaluating Amdahl’s law. Commun. ACM, 31(5):532–533. http://dx.doi.org/10.1145/42411.42415
|
[19] |
Hargrove, P.H., Duell, J.C., 2006. Berkeley lab checkpoint/restart (BLCR) for Linux clusters. J. Phys. Conf. Ser., 46(1):494–499. http://dx.doi.org/10.1088/1742-6596/46/1/067
|
[20] |
Hennessy, J.L., Patterson, D.A., 2011. Computer Architecture: a Quantitative Approach. Elsevier.
|
[21] |
HPCwire, 2010. DARPA Sets Ubiquitous HPC Program in Motion. Available from http://www.hpcwire.com/2010/08/10/darpa_sets_ubiquitous_hpc_program_in_motion/.
|
[22] |
Hu, W., Liu, G.M., Li, Q.,
|
[23] |
Kalaiselvi, S., Rajaraman, V., 2000. A survey of checkpointing algorithms for parallel and distributed computers. Sadhana, 25(5):489–510. http://dx.doi.org/10.1007/BF02703630
|
[24] |
Kim, Y., Gunasekaran, R., 2015. Understanding I/O workload characteristics of a peta-scale storage system. J. Supercomput., 71(3):761–780. http://dx.doi.org/10.1007/s11227-014-1321-8
|
[25] |
Kim, Y., Gunasekaran, R., Shipman, G.M.,
|
[26] |
Kotz, D., Nieuwejaar, N., 1994. Dynamic file-access characteristics of a production parallel scientific workload. Proc. Supercomputing, p.640–649. http://dx.doi.org/10.1109/SUPERC.1994.344328
|
[27] |
Liao, W.K., Ching, A., Coloma, K.,
|
[28] |
Liu, N., Cope, J., Carns, P.,
|
[29] |
Liu, Y., Gunasekaran, R., Ma, X.S.,
|
[30] |
Lu, K., 1999. Research on Parallel File Systems Technology Toward Parallel Computing. PhD Thesis, National University of Defense Technology, Changsha, China (in Chinese).
|
[31] |
Lucas, R., Ang, J., Bergman, K.,
|
[32] |
Miller, E.L., Katz, R.H., 1991. Input/output behavior of supercomputing applications. Proc. ACM/IEEE Conf. on Supercomputing, p.567–576. http://dx.doi.org/10.1145/125826.126133
|
[33] |
Moreira, J., Brutman, M., Castano, J.,
|
[34] |
Oldfield, R.A., Arunagiri, S., Teller, P.J.,
|
[35] |
Pasquale, B.K., Polyzos, G.C., 1993. A static analysis of I/O characteristics of scientific applications in a production workload. Proc. ACM/IEEE Conf. on Supercomputing, p.388–397. http://dx.doi.org/10.1145/169627.169759
|
[36] |
Plank, J.S., Beck, M., Kingsley, G.,
|
[37] |
Purakayastha, A., Ellis, C., Kotz, D.,
|
[38] |
Sisilli, J., 2015. Improved Solutions for I/O Provisioning and Application Acceleration. Available from http: //www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150811_FD11_Sisilli.pdf [Accessed on <Date>Nov. 18</Date>, 2015].
|
[39] |
Rudin, W., 1976. Principles of Mathematical Analysis. McGraw-Hill Publishing Co.
|
[40] |
Shalf, J., Dosanjh, S., Morrison, J., 2011. Exascale computing technology challenges. 9th Int. Conf. on High Performance Computing for Computational Science, p.1–25. http://dx.doi.org/10.1007/978-3-642-19328-6_1
|
[41] |
Strohmaier, E., Dongarra, J., Simon, H.,
|
[42] |
Sun, X.H., Ni, L.M., 1993. Scalable problems and memorybounded speedup. J. Parall. Distr. Comput., 19(1): 27–37. http://dx.doi.org/10.1006/jpdc.1993.1087
|
[43] |
University of California, 2007. IOR HPC Benchmark. Available from http://sourceforge.net/projects/ior-sio/ [Accessed on <Date>Sept. 1</Date>, 2014].
|
[44] |
Wang, F., Xin, Q., Hong, B.,
|
[45] |
Wang, T., Oral, S., Wang, Y.D.,
|
[46] |
Wang, T., Oral, S., Pritchard, M.,
|
[47] |
Wang, Z.Y., 2009. Reliability speedup: an effective metric for parallel application with checkpointing. Int. Conf. on Parallel and Distributed Computing, Applications and Technologies, p.247–254. http://dx.doi.org/10.1109/PDCAT.2009.19
|
[48] |
Xie, B., Chase, J., Dillow, D.,
|
[49] |
Yang, X.J., Du, J., Wang, Z.Y., 2011. An effective speedup metric for measuring productivity in large-scale parallel computer systems. J. Supercomput., 56(2):164–181. http://dx.doi.org/10.1007/s11227-009-0355-9
|
[50] |
Yang, X.J., Wang, Z.Y., Xue, J.L.,
|
/
〈 | 〉 |