
The TH Express high performance interconnect networks
Zhengbin PANG, Min XIE, Jun ZHANG, Yi ZHENG, Guibin WANG, Dezun DONG, Guang SUO
Front. Comput. Sci. ›› 2014, Vol. 8 ›› Issue (3) : 357-366.
The TH Express high performance interconnect networks
Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessor communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an uppermessage passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.
HPC / network interface chip (NIC) / TH Express interconnect / offload collective operation
[1] |
Top500, http://www.top500.org, 2013
|
[2] |
LiaoK X, XiaoQ L, YangQ C, LuT Y. MilkyWay-2 supercomputer system and application. Submitted to Frontiers of Computer Science, 2013
|
[3] |
PritchardH, GorodetskyI, BuntinasD. A ugni-based mpich2 nemesis network module for the cray xe. In: Proceedings of the 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface. 2011, 110-119
CrossRef
Google scholar
|
[4] |
XieM, LuY, LiuL, CaoH, YangX. Implementation and evaluation of network interface and message passing services for Tianhe-1a supercomputer. In: Proceedings of the 19th IEEE Annual Symposium on High Performance Interconnects. 2011, 78-86
|
[5] |
ChunB N, MainwaringA, CullerD E. Virtual network transport protocols for myrinet. IEEE Micro, 1998, 18(1): 53-63
CrossRef
Google scholar
|
[6] |
ArakiS, BilasA, DubnickiC, EdlerJ, KonishiK, PhilbinJ. Userspace communication: a quantitative study. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM). 1998, 1-16
|
[7] |
BhoedjangR A, RuhlT, BalH E. User-level network interface protocols. Computer, 1998, 31(11): 53-60
CrossRef
Google scholar
|
[8] |
SchoinasI, HillM D. Address translation mechanisms in network interfaces. In: Proceedings of the 4th International Symposium on High-Performance Computer Architecture. 1998, 219-230
|
[9] |
InfiniBand Architecture Specification: Release 1.0. InfiniBand Trade Association, 2000
|
[10] |
GrahamR L, PooleS, ShamisP, BlochG, BlochN, ChapmanH, KaganM, ShaharA, RabinovitzI, ShainerG. Overlapping computation and communication: Barrier algorithms and connectx-2 core-direct capabilities. In: Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. 2010, 1-8
|
[11] |
KandallaK, SubramoniH, VienneJ, RaikarS P, TomkoK, SurS, PandaD K. Designing non-blocking broadcast with collective offload on infiniband clusters: A case study with hpl. In: Proceedings of the 19th IEEE Annual Symposium on High Performance Interconnects. 2011, 27-34
|
[12] |
MPICH2: High-performance and Widely Portable MPI. http://www.mcs.anl.gov/research/projects/mpich2/
|
[13] |
BuntinasD, GoglinB, GoodellD, MercierG, MoreaudS. Cacheefflcient, intranode, large-message mpi communication with mpich2-nemesis. In: Proceedings of the 2009 International Conference on Parallel Processing. 2009, 462-469
CrossRef
Google scholar
|
[14] |
LauriaM, PakinS, ChienA. Efflcient layering for high speed communication: Fast messages 2. x. In: Proceedings of the 7th International Symposium on High Performance Distributed Computing. 1998, 10-20
|
[15] |
LiuJ, PandaD K. Implementing efflcient and scalable flow control schemes in MPI over infiniband. In: Proceedings of the 2004 International Parallel and Distributed Processing Symposium. 2004, 183b
|
[16] |
TezukaH, O’CarrollF, HoriA, IshikawaY. Pin-down cache: a virtual memory management technique for zero-copy communication. In: Proceedings of the 1998 Symposium on Parallel and Distributed Processing. 1998, 308-314
|
[17] |
MVAPICH: MPI over InfiniBand, 10GigE/iWARP and RoCE, 201318.
|
[18] |
VetterJ S, MuellerF. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. Journal of Parallel and Distributed Computing, 2003, 63(9): 853-865
CrossRef
Google scholar
|
[19] |
ChiuG. The IBM blue gene project. IBM Journal of Research and Development, 2013, 57(1): 1-6
|
[20] |
ChenD, EisleyN A, HeidelbergerP, SengerR M, SugawaraY, KumarS, SalapuraV, SatterfieldD L, Steinmacher-BurowB, ParkerJ J. The IBM blue gene/q interconnection fabric. IEEE Micro, 2012, 32(1): 32-43
CrossRef
Google scholar
|
[21] |
AjimaY, TakagiY, InoueT, HiramotoS, ShimizuT. The tofu interconnect. In: Proceedings of the 19th IEEE Annual Symposium on High Performance Interconnects. 2011, 87-94
|
[22] |
AlversonR, RowethD, KaplanL. The gemini system interconnect. In: Proceedings of the 18th IEEE Annual Symposium on High Performance Interconnects. 2010, 83-87
|
[23] |
SchroederB, GibsonG A. Understanding failures in petascale computers. In: Journal of Physics: Conference Series. 2007, Article 012022
|
[24] |
GrahamR L, PooleS, ShamisP, BlochG, BlochN, ChapmanH, KaganM, ShaharA, RabinovitzI, ShainerG. Connectx-2 infiniband management queues: first investigation of the new support for network offloaded collective operations. In: Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. 2010, 53-62
|
[25] |
SubramoniH, KandallaK, SurS, PandaD K. Design and evaluation of generalized collective communication primitives with overlap using connectx-2 offload engine. In: Proceedings of the 18th IEEE Annual Symposium on High Performance Interconnects. 2010, 40-49
|
[26] |
ArimilliB, ArimilliR, ChungV, ClarkS, DenzelW, DrerupB, HoeflerT, JoynerJ, LewisJ, LiJ. The percs high-performance interconnect. In: Proceedings of the 18th IEEE Annual Symposium on High Performance Interconnects. 2010, 75-82
|
/
〈 |
|
〉 |