Exploring high-performance processor architecture beyond the exascale

Xiang-hui XIE; Xun JIA

doi:10.1631/FITEE.1800424

PDF(356 KB)

Front. Inform. Technol. Electron. Eng ›› 2018, Vol. 19 ›› Issue (10) : 1224-1229. DOI: 10.1631/FITEE.1800424

Perspectives

Exploring high-performance processor architecture beyond the exascale

Xiang-hui XIE ,
Xun JIA

Author information +

History +

Abstract

The ever-increasing need for high performance in scientific computation and engineering applications will push high-performance computing beyond the exascale. As an integral part of a supercomputing system, highperformance processors and their architecture designs are crucial in improving system performance. In this paper, three architecture design goals for high-performance processors beyond the exascale are introduced, including effective performance scaling, efficient resource utilization, and adaptation to diverse applications. Then a high-performance many-core processor architecture with scalar processing and application-specific acceleration (Massa) is proposed, which aims to achieve the above three goals by employing the techniques of distributed computational resources and application-customized hardware. Finally, some future research directions regarding the Massa architecture are discussed.

Keywords

High-performance computing / Beyond the exascale / Processor architecture / Application-customized hardware / Distributed computational resources

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Xiang-hui XIE, Xun JIA. Exploring high-performance processor architecture beyond the exascale. Front. Inform. Technol. Electron. Eng, 2018, 19(10): 1224‒1229 https://doi.org/10.1631/FITEE.1800424

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Esmaeilzadeh H, Blem E, Amant RS, , 2011. Dark silicon and the end of multicore scaling. 38^th Annual Int Symp on Computer Architecture, p.365–376. https://doi.org/10.1145/2000064.2000108

[2]	Fang JR, Fu HH, Zhao WL, , 2017. swDNN: a library for accelerating deep learning applications on Sunway TaihuLight. 31^st Int Parallel and Distributed Processing Symp, p.615–624. https://doi.org/10.1109/IPDPS.2017.20

[3]	Fu HH, Liao JF, Yang JZ, , 2016. The Sunway TaihuLight supercomputer: system and applications. Sci China Inform Sci, 59(7):1–15. https://doi.org/10.1007/s11432-016-5588-7

[4]	Fu HH, He CH, Chen BW, , 2017. 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios. 30^th Int Conf for High Performance Computing, Networking, Storage and Analysis, p.1–12. https://doi.org/10.1145/3126908.3126910

[5]	García-Flores V, Ayguade E, Peña AJ, 2017. Efficient data sharing on heterogeneous systems. Proc 46^th Int Conf on Parallel Processing, p.121–130. https://doi.org/10.1109/ICPP.2017.21

[6]	Hemmert S, 2016. Green HPC: from nice to necessity. Comput Sci Eng, 12(6):8–10. https://doi.org/10.1109/MCSE.2010.134

[7]	Jia X, Wu GM, Xie XH, 2017. A high-performance accelerator for floating-point matrix multiplication. 15^th Int Symp on Parallel and Distributed Processing with Applicatons, p.396–402. https://doi.org/10.1109/ISPA/IUCC.2017.00063

[8]	Jouppi NP, Young C, Patil N, , 2017. In-datacenter performance analysis of a tensor processing unit. 44^th Annual Int Symp on Computer Architecture, p.1–12. https://doi.org/10.1145/3079856.3080246

[9]	Lin H, Tang XC, Yu BW, , 2017. Scalable graph on Sunway TaihuLight with ten million cores. 31^st Int Parallel and Distributed Processing Symp, p.635–645. https://doi.org/10.1109/IPDPS.2017.53

[10]	Ozdal MM, Yesil S, Kim T, , 2016. Energy efficient architecture for graph analytics accelerators. 43^rd Int Symp on Computer Architecture, p.166–177. https://doi.org/10.1109/ISCA.2016.24

[11]	Pedram A, Gerstlauer A, van de Geijn RA, 2011. A highperformance, low-power linear algebra core. 22^nd Int Conf on Application-specific System, Architecture and Processors, p.35–42. https://doi.org/10.1109/ASAP.2011.6043234

[12]	Schulte MJ, Ignatowski M, Loh GH, , 2015. Achieving exascale capabilities through heterogeneous computing. IEEE Micro, 35(4):26–36. https://doi.org/10.1109/MM.2015.71

[13]	Shalf JM, Leland R, 2015. Computing beyond Moore’s law. Computer, 48(12):14–23. https://doi.org/10.1109/MC.2015.374

[14]	Silbertstein M, 2017. OmniX: an accelerator-centric OS for omni-programmable systems. 16^th Workshop on Hot Topics in Operating Systems, p.69–75. https://doi.org/10.1145/3102980.3102992

[15]	Williams RS, 2017. What’s next? [The end of Moore’s law] Comput Sci Eng, 19(2):7–13. https://doi.org/10.1109/MCSE.2017.31

[16]	Xu ZG, Lin J, Matsuoka S, 2017. Benchmarking SW26010 many-core processor. 31^st Int Conf on Parallel and Distributed Processing Symp Workshops, p.743–752. https://doi.org/10.1109/IPDPSW.2017.9

[17]	Yang C, Xue W, Fu HH, , 2016. 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. 29^th Int Conf for High Performance Computing, Networking, Storage and Analysis, p.57–68. https://doi.org/10.1109/SC.2016.5

[18]	Zhao B, Gao W, Zhao RC, , 2015. Performance evaluation of NPB and SPEC CPU2006 on various SIMD extensions. 1^st Int Conf on Big Data Computing and Communications, p.257–272. https://doi.org/10.1007/978-3-319-22047-5_21

[19]	Zheng F, Zhang K, Wu GM, , 2014. Architecture techniques of many-core processor for energy-efficient in high performance computing. Chin J Comput, 37(10):2176–2186 (in Chinese). https://doi.org/10.3724/SP.J.1016.2014.02176

[20]	Zheng F, Li HL, Lv H, , 2015. Cooperative computing techniques for a deeply fused and heterogeneous manycore processor architecture. J Comput Sci Technol, 30(1):145–162. https://doi.org/10.1007/s11390-015-1510-9