The TH Express high performance interconnect networks
Zhengbin PANG, Min XIE, Jun ZHANG, Yi ZHENG, Guibin WANG, Dezun DONG, Guang SUO
The TH Express high performance interconnect networks
Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessor communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an uppermessage passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.
HPC / network interface chip (NIC) / TH Express interconnect / offload collective operation
[1] |
Top500, http://www.top500.org, 2013
|
[2] |
LiaoK X, XiaoQ L, YangQ C, LuT Y. MilkyWay-2 supercomputer system and application. Submitted to Frontiers of Computer Science, 2013
|
[3] |
PritchardH, GorodetskyI, BuntinasD. A ugni-based mpich2 nemesis network module for the cray xe. In: Proceedings of the 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface. 2011, 110-119
CrossRef
Google scholar
|
[4] |
XieM, LuY, LiuL, CaoH, YangX. Implementation and evaluation of network interface and message passing services for Tianhe-1a supercomputer. In: Proceedings of the 19th IEEE Annual Symposium on High Performance Interconnects. 2011, 78-86
|
[5] |
ChunB N, MainwaringA, CullerD E. Virtual network transport protocols for myrinet. IEEE Micro, 1998, 18(1): 53-63
CrossRef
Google scholar
|
[6] |
ArakiS, BilasA, DubnickiC, EdlerJ, KonishiK, PhilbinJ. Userspace communication: a quantitative study. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM). 1998, 1-16
|
[7] |
BhoedjangR A, RuhlT, BalH E. User-level network interface protocols. Computer, 1998, 31(11): 53-60
CrossRef
Google scholar
|
[8] |
SchoinasI, HillM D. Address translation mechanisms in network interfaces. In: Proceedings of the 4th International Symposium on High-Performance Computer Architecture. 1998, 219-230
|
[9] |
InfiniBand Architecture Specification: Release 1.0. InfiniBand Trade Association, 2000
|
[10] |
GrahamR L, PooleS, ShamisP, BlochG, BlochN, ChapmanH, KaganM, ShaharA, RabinovitzI, ShainerG. Overlapping computation and communication: Barrier algorithms and connectx-2 core-direct capabilities. In: Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. 2010, 1-8
|
[11] |
KandallaK, SubramoniH, VienneJ, RaikarS P, TomkoK, SurS, PandaD K. Designing non-blocking broadcast with collective offload on infiniband clusters: A case study with hpl. In: Proceedings of the 19th IEEE Annual Symposium on High Performance Interconnects. 2011, 27-34
|
[12] |
MPICH2: High-performance and Widely Portable MPI. http://www.mcs.anl.gov/research/projects/mpich2/
|
[13] |
BuntinasD, GoglinB, GoodellD, MercierG, MoreaudS. Cacheefflcient, intranode, large-message mpi communication with mpich2-nemesis. In: Proceedings of the 2009 International Conference on Parallel Processing. 2009, 462-469
CrossRef
Google scholar
|
[14] |
LauriaM, PakinS, ChienA. Efflcient layering for high speed communication: Fast messages 2. x. In: Proceedings of the 7th International Symposium on High Performance Distributed Computing. 1998, 10-20
|
[15] |
LiuJ, PandaD K. Implementing efflcient and scalable flow control schemes in MPI over infiniband. In: Proceedings of the 2004 International Parallel and Distributed Processing Symposium. 2004, 183b
|
[16] |
TezukaH, O’CarrollF, HoriA, IshikawaY. Pin-down cache: a virtual memory management technique for zero-copy communication. In: Proceedings of the 1998 Symposium on Parallel and Distributed Processing. 1998, 308-314
|
[17] |
MVAPICH: MPI over InfiniBand, 10GigE/iWARP and RoCE, 201318.
|
[18] |
VetterJ S, MuellerF. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. Journal of Parallel and Distributed Computing, 2003, 63(9): 853-865
CrossRef
Google scholar
|
[19] |
ChiuG. The IBM blue gene project. IBM Journal of Research and Development, 2013, 57(1): 1-6
|
[20] |
ChenD, EisleyN A, HeidelbergerP, SengerR M, SugawaraY, KumarS, SalapuraV, SatterfieldD L, Steinmacher-BurowB, ParkerJ J. The IBM blue gene/q interconnection fabric. IEEE Micro, 2012, 32(1): 32-43
CrossRef
Google scholar
|
[21] |
AjimaY, TakagiY, InoueT, HiramotoS, ShimizuT. The tofu interconnect. In: Proceedings of the 19th IEEE Annual Symposium on High Performance Interconnects. 2011, 87-94
|
[22] |
AlversonR, RowethD, KaplanL. The gemini system interconnect. In: Proceedings of the 18th IEEE Annual Symposium on High Performance Interconnects. 2010, 83-87
|
[23] |
SchroederB, GibsonG A. Understanding failures in petascale computers. In: Journal of Physics: Conference Series. 2007, Article 012022
|
[24] |
GrahamR L, PooleS, ShamisP, BlochG, BlochN, ChapmanH, KaganM, ShaharA, RabinovitzI, ShainerG. Connectx-2 infiniband management queues: first investigation of the new support for network offloaded collective operations. In: Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. 2010, 53-62
|
[25] |
SubramoniH, KandallaK, SurS, PandaD K. Design and evaluation of generalized collective communication primitives with overlap using connectx-2 offload engine. In: Proceedings of the 18th IEEE Annual Symposium on High Performance Interconnects. 2010, 40-49
|
[26] |
ArimilliB, ArimilliR, ChungV, ClarkS, DenzelW, DrerupB, HoeflerT, JoynerJ, LewisJ, LiJ. The percs high-performance interconnect. In: Proceedings of the 18th IEEE Annual Symposium on High Performance Interconnects. 2010, 75-82
|
/
〈 | 〉 |