Hybrid hierarchy storage system in MilkyWay-2 supercomputer
Weixia XU, Yutong LU, Qiong LI, Enqiang ZHOU, Zhenlong SONG, Yong DONG, Wei ZHANG, Dengping WEI, Xiaoming ZHANG, Haitao CHEN, Jianying XING, Yuan YUAN
Hybrid hierarchy storage system in MilkyWay-2 supercomputer
With the rapid improvement of computation capability in high performance supercomputer system, the imbalance of performance between computation subsystem and storage subsystem has become more and more serious, especially when various big data are produced ranging from tens of gigabytes up to terabytes. To reduce this gap, large-scale storage systems need to be designed and implemented with high performance and scalability. MilkyWay-2 (TH-2) supercomputer system with peak performance 54.9 Pflops, definitely has this kind of requirement for storage system. This paper mainly introduces the storage system in MilkyWay-2 supercomputer, including the hardware architecture and the parallel file system. The storage system in MilkyWay-2 supercomputer exploits a novel hybrid hierarchy storage architecture to enable high scalability of I/O clients, I/O bandwidth and storage capacity. To fit this architecture, a user level virtualized file system, named H2FS, is designed and implemented which can cooperate local storage and shared storage together into a dynamic single namespace to optimize I/O performance in IO-intensive applications. The evaluation results show that the storage system in MilkyWay-2 supercomputer can satisfy the critical requirements in large scale supercomputer, such as performance and scalability.
supercomputer / storage system / file system / MilkyWay-2 / hybrid / hierarchy
[1] |
FranksB. Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics. www.wiley.com, 2012
|
[2] |
VertaO, MastroianniC, TaliaD. A super-peer model for resource discovery services in large-scale grids. Future Generation Computer Systems, 2005, 21(8): 1235-1248
CrossRef
Google scholar
|
[3] |
BentJ, GriderG, KetteringBr, ManzanaresA, McClellandM, TorresA, TorrezA. Storage challenges at Los Alamos National Lab. In: Proceedings of the 2012 Symposium on Massive Storage Systems and Technologies. 2012: 1-5
|
[4] |
WatsonR W, CoyneR A. The parallel I/O architecture of the highperformance storage system. In: Proceedings of the 14th IEEE Symposium on Mass Storage Systems. 1995, 27-44
CrossRef
Google scholar
|
[5] |
LofsteadJ, ZhengF, LiuQ, KlaskyS, OldfieldR, KordenbrockT, SchwanK, WolfM. Managing variability in the IO performance of petascale storage system. IEEE Computer Society, 2010: 1-12
|
[6] |
ZhugeH. The Knowledge Grid. Singapore: World Scientific, 2004
CrossRef
Google scholar
|
[7] |
OldfieldR A, MaccabeA B, ArunagiriS, KordenbrockT, RiesenR, WardL, WidenerP. Lightweight I/O for scientific applications. Technical Report of Sandia National Laboratories, 2006, 1-11
|
[8] |
LiuN, CopeJ, CarnsP H, CarothersC D, RossR B, GriderG, CrumeA, MaltzahnC. On the role of burst buffers in leadership-class storage systems. In: Proceedings of the 2012 Symposium on Massive Storage Systems and Technologies. 2012: 1-11
|
[9] |
ZhouE Q, LuY T, ZhangW, DongY. H2FS: a hybrid hierarchy filesystem for scalable data-intensive computing for HPC systems. Poster paper in International Supercomputing Conference. 2013
|
[10] |
Lustre: A scalable, high-performance file system. Cluster File Systems Inc. Whitepaper, Version 1.0, November2002. http://www.lustre.org/docs/white paper.pdf
|
[11] |
XieM, LuY T, LiuL, CaoH J, YangX J. Implementation and evaluation of network interface and message passing services for TianHe-1A supercomputer. In: Proceedings of the 19th Annual IEEE Symposium on High Performance Interconnects. 2011, 78-86
|
[12] |
WelchB, UnangstM, AbbasiZ, GibsonG, MuellerB, SmallJ, ZelenkaJ, ZhouB. Scalable performance of the panasas parallel file system. FAST, 2008, 8: 1-17
|
[13] |
Top500 Lists, http://www.top500.org/lists/
|
[14] |
RyuK D, InglettT A, BellofattoR, Blocksome,M. A, GoodingT, KumarS, MamidalaA R, Megerian,M G, MillerS, NelsonM T, RosenburgB, SmithB, VanO J, WangA, WisniewskiR W. IBM Blue Gene/Q system software stack. IBM Journal of Research and Development, 2013, 57(1/2): 1-12
CrossRef
Google scholar
|
[15] |
RogersJ. Power efflciency and performance with ORNL’s cray XK7 Titan. Star Craft Companion, 2012: 1040-1050
|
[16] |
YuW, VetterJ S, OralH S. Performance characterization and optimization of parallel i/o on the cray XT. In: Proceedings of the IEEE International Symposium on Parallel and Distributed Processing. 2008, 1-11
|
[17] |
YuW, OralS, VetterJ, BarrettR. Efflciency evaluation of Ccray XT parallel IO stack. Cray User Group Meeting, 2007, 1-9
|
[18] |
MiyazakiH, KusanoY, ShinjouN,
|
[19] |
XingJ, XiongJ, SunN, JieM. Adaptive and scalable metadata management to support a trillion files. In: Proceedings of the ACM Conference on High Performance Computing Networking, Storage and Analysis. 2009, 26: 1-11
CrossRef
Google scholar
|
[20] |
SurendraB, ChouJ, RübelO, Prabhat, KarimabadiH, DaughtonW S, RoytershteynV, BethelE W, HowisonM, HsuK J, LinK W, ShoshaniA, UseltonA, WuK. Parallel I/O, analysis, and visualization of a trillion particle simulation. Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 2012, 59: 1-12
|
[21] |
PayneM, WidenerP, WolfM, AbbasiH, McManusS, BridgesP G, SchwanK. Exploiting latent I/O asynchrony in petascale science applications. In: Proceedings of the 4th IEEE International Conference on EScience. 2008: 410-411
|
[22] |
AliN, CarnsP, IskraK, KimpeD, LangS, LathamR, RossR, WardL, SadayappanP. Scalable I/O forwarding framework for highperformance computing systems. In: Proceedings of the 2009 IEEE International Conference on Cluster Computing and Workshops. 2009, 1-10
CrossRef
Google scholar
|
[23] |
LuY Y, ShuJ W, LiS, YiL T. Accelerating distributed updates with asynchronous ordered writes in a parallel file system. In: Proceedings of the 2012 IEEE International Conference on Cluster Computing. 2012, 302-310
CrossRef
Google scholar
|
[24] |
SheplerS, CallaghanB, RobinsonD, ThurlowR, Sun Microsystems Inc., BeameC, Hummingbird Ltd., EislerM, DoveckD, Network Appliance Inc. Network file system version 4 protocol. Network, 2003, 3530
|
[25] |
GoodsonG, WelchB, HalevyB, BlackD, AdamsonA. NFSv4 pNFS extensions. Technical Report, 2005
|
[26] |
CarnsP H, SettlemyerB W, LigonW B III. Using server-to-server communication in parallel file systems to simplify consistency and improve performance. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. 2008, 6
|
[27] |
DevulapalliA, OhioP W. File creation strategies in a distributed metadata file system. In: Proceedings of the 2007 IEEE International on Parallel and Distributed Processing Symposium. 2007, 1-10
CrossRef
Google scholar
|
[28] |
CarnsP, LangS, RossR, VilayannurM, KunkelJ, LudwigT. Smallfile access in parallel file systems. In: Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing. 2009, 1-11
CrossRef
Google scholar
|
[29] |
SakaiK, SumimotoS, KurokawaM. High-performance and highly reliable file system for the K computer. FUJITSU Science Technology, 2012, 48(3): 302-209
|
[30] |
LiuQ, KlaskyS, PodhorszkiN, LofsteadH, AbbasiC S, ChangJ, CummingsD, DinakarC, DocanS, EthierR, GroutT, KordenbrockZ, LinX, MaR, OldfieldM, ParasharA, RomosanN, SamatovaK, SchwanA, ShoshaniY, TianM, WolfW, YuF, ZhangF, ZhengF. ADIOS: powering I/O to extreme scale computing. 1-6
|
[31] |
LofsteadJ, ZhengF, KlaskyS, SchwanK. Adaptable, metadata rich IO methods for portable high performance IO. In: Proceedings of the 2009 International Parallel and Distributed Processing. 2009, 1-10
|
[32] |
LiJ, LiaoW, ChoudharyA, RossR, ThakurR, GroppW, LathamR, SiegelA, GallagherB, ZingaleM. Parallel netCDF: a highperformance scientific I/O interface. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing. 2003, 39
CrossRef
Google scholar
|
/
〈 | 〉 |