Exploring high-performance processor architecture beyond the exascale

Xiang-hui XIE , Xun JIA

Front. Inform. Technol. Electron. Eng ›› 2018, Vol. 19 ›› Issue (10) : 1224 -1229.

PDF (356KB)
Front. Inform. Technol. Electron. Eng ›› 2018, Vol. 19 ›› Issue (10) : 1224 -1229. DOI: 10.1631/FITEE.1800424
Perspectives
Perspectives

Exploring high-performance processor architecture beyond the exascale

Author information +
History +
PDF (356KB)

Abstract

The ever-increasing need for high performance in scientific computation and engineering applications will push high-performance computing beyond the exascale. As an integral part of a supercomputing system, highperformance processors and their architecture designs are crucial in improving system performance. In this paper, three architecture design goals for high-performance processors beyond the exascale are introduced, including effective performance scaling, efficient resource utilization, and adaptation to diverse applications. Then a high-performance many-core processor architecture with scalar processing and application-specific acceleration (Massa) is proposed, which aims to achieve the above three goals by employing the techniques of distributed computational resources and application-customized hardware. Finally, some future research directions regarding the Massa architecture are discussed.

Keywords

High-performance computing / Beyond the exascale / Processor architecture / Application-customized hardware / Distributed computational resources

Cite this article

Download citation ▾
Xiang-hui XIE, Xun JIA. Exploring high-performance processor architecture beyond the exascale. Front. Inform. Technol. Electron. Eng, 2018, 19(10): 1224-1229 DOI:10.1631/FITEE.1800424

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Esmaeilzadeh H, Blem E, Amant RS, , 2011. Dark silicon and the end of multicore scaling. 38th Annual Int Symp on Computer Architecture, p.365–376.

[2]

Fang JR, Fu HH, Zhao WL, , 2017. swDNN: a library for accelerating deep learning applications on Sunway TaihuLight. 31st Int Parallel and Distributed Processing Symp, p.615–624.

[3]

Fu HH, Liao JF, Yang JZ, , 2016. The Sunway TaihuLight supercomputer: system and applications. Sci China Inform Sci, 59(7):1–15.

[4]

Fu HH, He CH, Chen BW, , 2017. 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios. 30th Int Conf for High Performance Computing, Networking, Storage and Analysis, p.1–12.

[5]

García-Flores V, Ayguade E, Peña AJ, 2017. Efficient data sharing on heterogeneous systems. Proc 46th Int Conf on Parallel Processing, p.121–130.

[6]

Hemmert S, 2016. Green HPC: from nice to necessity. Comput Sci Eng, 12(6):8–10.

[7]

Jia X, Wu GM, Xie XH, 2017. A high-performance accelerator for floating-point matrix multiplication. 15th Int Symp on Parallel and Distributed Processing with Applicatons, p.396–402.

[8]

Jouppi NP, Young C, Patil N, , 2017. In-datacenter performance analysis of a tensor processing unit. 44th Annual Int Symp on Computer Architecture, p.1–12.

[9]

Lin H, Tang XC, Yu BW, , 2017. Scalable graph on Sunway TaihuLight with ten million cores. 31st Int Parallel and Distributed Processing Symp, p.635–645.

[10]

Ozdal MM, Yesil S, Kim T, , 2016. Energy efficient architecture for graph analytics accelerators. 43rd Int Symp on Computer Architecture, p.166–177.

[11]

Pedram A, Gerstlauer A, van de Geijn RA, 2011. A highperformance, low-power linear algebra core. 22nd Int Conf on Application-specific System, Architecture and Processors, p.35–42.

[12]

Schulte MJ, Ignatowski M, Loh GH, , 2015. Achieving exascale capabilities through heterogeneous computing. IEEE Micro, 35(4):26–36.

[13]

Shalf JM, Leland R, 2015. Computing beyond Moore’s law. Computer, 48(12):14–23.

[14]

Silbertstein M, 2017. OmniX: an accelerator-centric OS for omni-programmable systems. 16th Workshop on Hot Topics in Operating Systems, p.69–75.

[15]

Williams RS, 2017. What’s next? [The end of Moore’s law] Comput Sci Eng, 19(2):7–13.

[16]

Xu ZG, Lin J, Matsuoka S, 2017. Benchmarking SW26010 many-core processor. 31st Int Conf on Parallel and Distributed Processing Symp Workshops, p.743–752.

[17]

Yang C, Xue W, Fu HH, , 2016. 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. 29th Int Conf for High Performance Computing, Networking, Storage and Analysis, p.57–68.

[18]

Zhao B, Gao W, Zhao RC, , 2015. Performance evaluation of NPB and SPEC CPU2006 on various SIMD extensions. 1st Int Conf on Big Data Computing and Communications, p.257–272.

[19]

Zheng F, Zhang K, Wu GM, , 2014. Architecture techniques of many-core processor for energy-efficient in high performance computing. Chin J Comput, 37(10):2176–2186 (in Chinese).

[20]

Zheng F, Li HL, Lv H, , 2015. Cooperative computing techniques for a deeply fused and heterogeneous manycore processor architecture. J Comput Sci Technol, 30(1):145–162.

RIGHTS & PERMISSIONS

Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (356KB)

Supplementary files

FITEE-1224-18003-XHX_suppl_1

FITEE-1224-18003-XHX_suppl_2

3943

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/