The TH Express high performance interconnect networks

Zhengbin PANG; Min XIE; Jun ZHANG; Yi ZHENG; Guibin WANG; Dezun DONG; Guang SUO

doi:10.1007/s11704-014-3500-9

PDF(338 KB)

Front. Comput. Sci. ›› 2014, Vol. 8 ›› Issue (3) : 357-366. DOI: 10.1007/s11704-014-3500-9

RESEARCH ARTICLE

The TH Express high performance interconnect networks

Zhengbin PANG¹^,² ,
Min XIE² ,
Jun ZHANG² ,
Yi ZHENG² ,
Guibin WANG² ,
Dezun DONG¹^,² ,
Guang SUO²

Author information +

History +

Abstract

Interconnection network plays an important role in scalable high performance computer (HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to provide high-bandwidth and low-latency interprocessor communications, and continuous efforts are devoted to the development of our proprietary interconnect. This paper describes the state-of-the-art of our proprietary interconnect, especially emphasizing on the design of network interface. Several key features are introduced, such as user-level communication, remote direct memory access, offload collective operation, and hardware reliable end-to-end communication, etc. The design of a low level message passing infrastructures and an uppermessage passing services are also proposed. The preliminary performance results demonstrate the efficiency of the TH interconnect interface.

Keywords

HPC / network interface chip (NIC) / TH Express interconnect / offload collective operation

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Zhengbin PANG, Min XIE, Jun ZHANG, Yi ZHENG, Guibin WANG, Dezun DONG, Guang SUO. The TH Express high performance interconnect networks. Front. Comput. Sci., 2014, 8(3): 357‒366 https://doi.org/10.1007/s11704-014-3500-9

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Top500, http://www.top500.org, 2013

[2]	LiaoK X, XiaoQ L, YangQ C, LuT Y. MilkyWay-2 supercomputer system and application. Submitted to Frontiers of Computer Science, 2013

[3]	PritchardH, GorodetskyI, BuntinasD. A ugni-based mpich2 nemesis network module for the cray xe. In: Proceedings of the 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface. 2011, 110-119 CrossRef Google scholar

[4]	XieM, LuY, LiuL, CaoH, YangX. Implementation and evaluation of network interface and message passing services for Tianhe-1a supercomputer. In: Proceedings of the 19th IEEE Annual Symposium on High Performance Interconnects. 2011, 78-86

[5]	ChunB N, MainwaringA, CullerD E. Virtual network transport protocols for myrinet. IEEE Micro, 1998, 18(1): 53-63 CrossRef Google scholar

[6]	ArakiS, BilasA, DubnickiC, EdlerJ, KonishiK, PhilbinJ. Userspace communication: a quantitative study. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (CDROM). 1998, 1-16

[7]	BhoedjangR A, RuhlT, BalH E. User-level network interface protocols. Computer, 1998, 31(11): 53-60 CrossRef Google scholar

[8]	SchoinasI, HillM D. Address translation mechanisms in network interfaces. In: Proceedings of the 4th International Symposium on High-Performance Computer Architecture. 1998, 219-230

[9]	InfiniBand Architecture Specification: Release 1.0. InfiniBand Trade Association, 2000

[10]

GrahamR L, PooleS, ShamisP, BlochG, BlochN, ChapmanH, KaganM, ShaharA, RabinovitzI, ShainerG. Overlapping computation and communication: Barrier algorithms and connectx-2 core-direct capabilities. In: Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum. 2010, 1-8

[11]	KandallaK, SubramoniH, VienneJ, RaikarS P, TomkoK, SurS, PandaD K. Designing non-blocking broadcast with collective offload on infiniband clusters: A case study with hpl. In: Proceedings of the 19th IEEE Annual Symposium on High Performance Interconnects. 2011, 27-34

[12]	MPICH2: High-performance and Widely Portable MPI. http://www.mcs.anl.gov/research/projects/mpich2/

[13]	BuntinasD, GoglinB, GoodellD, MercierG, MoreaudS. Cacheefflcient, intranode, large-message mpi communication with mpich2-nemesis. In: Proceedings of the 2009 International Conference on Parallel Processing. 2009, 462-469 CrossRef Google scholar

[14]	LauriaM, PakinS, ChienA. Efflcient layering for high speed communication: Fast messages 2. x. In: Proceedings of the 7th International Symposium on High Performance Distributed Computing. 1998, 10-20

[15]	LiuJ, PandaD K. Implementing efflcient and scalable flow control schemes in MPI over infiniband. In: Proceedings of the 2004 International Parallel and Distributed Processing Symposium. 2004, 183b

[16]	TezukaH, O’CarrollF, HoriA, IshikawaY. Pin-down cache: a virtual memory management technique for zero-copy communication. In: Proceedings of the 1998 Symposium on Parallel and Distributed Processing. 1998, 308-314

[17]	MVAPICH: MPI over InfiniBand, 10GigE/iWARP and RoCE, 201318.

[18]	VetterJ S, MuellerF. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. Journal of Parallel and Distributed Computing, 2003, 63(9): 853-865 CrossRef Google scholar

[19]	ChiuG. The IBM blue gene project. IBM Journal of Research and Development, 2013, 57(1): 1-6

[20]	ChenD, EisleyN A, HeidelbergerP, SengerR M, SugawaraY, KumarS, SalapuraV, SatterfieldD L, Steinmacher-BurowB, ParkerJ J. The IBM blue gene/q interconnection fabric. IEEE Micro, 2012, 32(1): 32-43 CrossRef Google scholar

[21]	AjimaY, TakagiY, InoueT, HiramotoS, ShimizuT. The tofu interconnect. In: Proceedings of the 19th IEEE Annual Symposium on High Performance Interconnects. 2011, 87-94

[22]	AlversonR, RowethD, KaplanL. The gemini system interconnect. In: Proceedings of the 18th IEEE Annual Symposium on High Performance Interconnects. 2010, 83-87

[23]	SchroederB, GibsonG A. Understanding failures in petascale computers. In: Journal of Physics: Conference Series. 2007, Article 012022

[24]

GrahamR L, PooleS, ShamisP, BlochG, BlochN, ChapmanH, KaganM, ShaharA, RabinovitzI, ShainerG. Connectx-2 infiniband management queues: first investigation of the new support for network offloaded collective operations. In: Proceedings of the 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. 2010, 53-62

[25]	SubramoniH, KandallaK, SurS, PandaD K. Design and evaluation of generalized collective communication primitives with overlap using connectx-2 offload engine. In: Proceedings of the 18th IEEE Annual Symposium on High Performance Interconnects. 2010, 40-49

[26]	ArimilliB, ArimilliR, ChungV, ClarkS, DenzelW, DrerupB, HoeflerT, JoynerJ, LewisJ, LiJ. The percs high-performance interconnect. In: Proceedings of the 18th IEEE Annual Symposium on High Performance Interconnects. 2010, 75-82