A comprehensive survey on graph neural network accelerators
Jingyu LIU, Shi CHEN, Li SHEN
A comprehensive survey on graph neural network accelerators
Deep learning has gained superior accuracy on Euclidean structure data in neural networks. As a result, non-Euclidean structure data, such as graph data, has more sophisticated structural information, which can be applied in neural networks as well to address more complex and practical problems. However, actual graph data obeys a power-law distribution, so the adjacent matrix of a graph is random and sparse. Graph processing accelerator (GPA) is designed to handle the problems above. However, graph computing only processes 1-dimensional data. In graph neural networks (GNNs), graph data is multi-dimensional. Consequently, GNNs include the execution processes of both traditional graph processing and neural network, which have irregular memory access and regular computation, respectively. To obtain more information in graph data and require better model generalization ability, the layers of GNN are deeper, so the overhead of memory access and computation is considerable. At present, GNN accelerators are designed to deal with this issue. In this paper, we conduct a systematic survey regarding the design and implementation of GNN accelerators. Specifically, we review the challenges faced by GNN accelerators, and existing related works in detail to process them. Finally, we evaluate previous works and propose future directions in this booming field.
graph neural network / accelerators / graph convolutional networks / design space exploration / deep learning / domain-specific architecture
Jingyu Liu received the master degree in integrated circuit engineering from National University of Defense Technology, China in 2021. He is now working towards the PhD degree with the School of Computer, National University of Defense Technology, China. His research interests include computer architecture and graph-based hardware accelerator
Shi Chen received the bachelor degree in Computer Science & Technology from National University of Defense Technology, China in 2021. He is now working towards the PhD degree with the School of Computer, National University of Defense Technology, China. His research interests include computer architecture and graph-based hardware accelerator
Li Shen received the BS and PhD degrees in Computer Science & Technology from National University of Defense Technology, China. Currently he is a professor at School of Computer, National University of Defense Technology, China. His research interests include high performance processor architecture, parallel programming, and performance optimization techniques
[1] |
Cao S, Lu W, Xu Q. Deep neural networks for learning graph representations. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 1145−1152
|
[2] |
Velickovic P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. In: Proceedings of the 6th International Conference on Learning Representations. 2018
|
[3] |
You J, Ying R, Ren X, Hamilton W L, Leskovec J. GraphRNN: a deep generative model for graphs. 2018, arXiv preprint arXiv: 1802.08773
|
[4] |
Xu K, Hu W, Leskovec J, Jegelka S. How powerful are graph neural networks? In: Proceedings of the 7th International Conference on Learning Representations. 2019
|
[5] |
Wu Z, Pan S, Chen F, Long G, Zhang C, Yu P S . A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32( 1): 4–24
|
[6] |
Hamilton W L, Ying Z, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 1025−1035
|
[7] |
Ying R, He R, Chen K, Eksombatchai P, Hamilton W L, Leskovec J. Graph convolutional neural networks for web-scale recommender systems. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018, 974−983
|
[8] |
Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. In: Proceedings of the 5th International Conference on Learning Representations. 2017
|
[9] |
Gao H, Wang Z, Ji S. Large-scale learnable graph convolutional networks. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018, 1416−1424
|
[10] |
Li R, Wang S, Zhu F, Huang J. Adaptive graph convolutional neural networks. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI 18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI 18). 2018, 434
|
[11] |
Xu K, Hu W, Leskovec J, Jegelka S. How powerful are graph neural networks? 2018, arXiv preprint arXiv: 1810.00826
|
[12] |
Zhang M, Cui Z, Neumann M, Chen Y. An end-to-end deep learning architecture for graph classification. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18). 2018, 4438−4445
|
[13] |
Yin L, Wang J, Zheng H. Exploring architecture, dataflow, and sparsity for gcn accelerators: a holistic framework. In: Proceedings of the Great Lakes Symposium on VLSI 2023. 2023, 489–495
|
[14] |
Garg R, Qin E, Munoz-Matrinez F, Guirado R, Jain A, Abadal S, Abellan J L, Acacio M E, Alarcon E, Rajamanickam S, Krishna T. Understanding the design space of sparse/dense multiphase gnn dataflows on spatial accelerators. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium. 2022, 571–582
|
[15] |
Hamaguchi T, Oiwa H, Shimbo M, Matsumoto Y. Knowledge transfer for out-of-knowledge-base entities: a graph neural network approach. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 1802−1808
|
[16] |
Schlichtkrull M, Kipf T N, Bloem P, van den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. In: Proceedings of the 15th International Conference on the Semantic Web. 2018, 593−607
|
[17] |
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M . Graph neural networks: a review of methods and applications. AI Open, 2020, 1: 57–81
|
[18] |
Ma L, Yang Z, Miao Y, Xue J, Wu M, Zhou L, Dai Y. NeuGraph: parallel deep neural network computation on large graphs. In: Proceedings of 2019 USENIX Annual Technical Conference. 2019, 443−458
|
[19] |
Yan M, Chen Z, Deng L, Ye X, Zhang Z, Fan D, Xie Y . Characterizing and understanding GCNs on GPU. IEEE Computer Architecture Letters, 2020, 19( 1): 22–25
|
[20] |
Yang J, Tang D, Song X, Wang L, Yin Q, Chen R, Yu W, Zhou J. GNNLab: a factored system for sample based GNN training over GPUs. In: Proceedings of the 17th European Conference on Computer Systems. 2022, 417−434
|
[21] |
Wang L, Yin Q, Tian C, Yang J, Chen R, Yu W, Yao Z, Zhou J. FlexGraph: a flexible and efficient distributed framework for GNN training. In: Proceedings of the 16th European Conference on Computer Systems. 2021, 67−82
|
[22] |
Tailor S A, Fernández-Marqués J, Lane N D. Degree-quant: quantization-aware training for graph neural networks. In: Proceedings of the 9th International Conference on Learning Representations. 2021
|
[23] |
Feng B, Wang Y, Li X, Yang S, Peng X, Ding Y. SGQuant: squeezing the last bit on graph neural networks with specialized quantization. In: Proceedings of the 32nd IEEE International Conference on Tools with Artificial Intelligence. 2020, 1044−1052
|
[24] |
Wang Y, Feng B, Ding Y. QGTC: accelerating quantized graph neural networks via GPU tensor core. In: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2022, 107−119
|
[25] |
Wang Y, Feng B, Li G, Li S, Deng L, Xie Y, Ding Y. GNNAdvisor: an adaptive and efficient runtime system for GNN acceleration on GPUs. In: Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation. 2021, 515−531
|
[26] |
Geng T, Li A, Shi R, Wu C, Wang T, Li Y, Haghi P, Tumeo A, Che S, Reinhardt S, Herbordt M C. AWB-GCN: a graph convolutional network accelerator with runtime workload rebalancing. In: Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture. 2020, 922−936
|
[27] |
Huang Y, Zheng L, Yao P, Wang Q, Liao X, Jin H, Xue J. Accelerating graph convolutional networks using crossbar-based processing-in-memory architectures. In: Proceedings of IEEE International Symposium on High-Performance Computer Architecture. 2022, 1029−1042
|
[28] |
Li J, Louri A, Karanth A, Bunescu R C. GCNAX: a flexible and energy-efficient accelerator for graph convolutional neural networks. In: Proceedings of IEEE International Symposium on High-Performance Computer Architecture. 2021, 775−788
|
[29] |
You H, Geng T, Zhang Y, Li A, Lin Y. GCoD: graph convolutional network acceleration via dedicated algorithm and accelerator Co-design. In: Proceedings of IEEE International Symposium on High-Performance Computer Architecture. 2022, 460−474
|
[30] |
Geng T, Wu C, Zhang Y, Tan C, Xie C, You H, Herbordt M, Lin Y, Li A. I-GCN: a graph convolutional network accelerator with runtime locality enhancement through islandization. In: Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchitecture. 2021, 1051−1063
|
[31] |
Gong Z, Ji H, Yao Y, Fletcher C W, Hughes C J, Torrellas J. Graphite: optimizing graph neural networks on CPUs through cooperative software-hardware techniques. In: Proceedings of the 49th Annual International Symposium on Computer Architecture. 2022, 916−931
|
[32] |
Hwang R, Kang M, Lee J, Kam D, Lee Y, Rhu M. GROW: a row-stationary sparse-dense GEMM accelerator for memory-efficient graph convolutional neural networks. In: Proceedings of IEEE International Symposium on High-Performance Computer Architecture. 2023, 42−55
|
[33] |
Yoo M, Song J, Lee J, Kim N, Kim Y, Lee J. SGCN: exploiting compressed-sparse features in deep graph convolutional network accelerators. In: Proceedings of IEEE International Symposium on High-Performance Computer Architecture. 2023, 1−14
|
[34] |
Yang T, Li D, Ma F, Song Z, Zhao Y, Zhang J, Liu F, Jiang L . PASGCN: an ReRAM-based PIM design for GCN with adaptively sparsified graphs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42( 1): 150–163
|
[35] |
Xu S, Shao Z, Yang C, Liao X, Jin H . Accelerating backward aggregation in GCN training with execution path preparing on GPUs. IEEE Transactions on Parallel and Distributed Systems, 2022, 33( 12): 4891–4902
|
[36] |
Gui C, Zheng L, He B, Liu C, Chen X, Liao X, Jin H . A survey on graph processing accelerators: challenges and opportunities. Journal of Computer Science and Technology, 2019, 34( 2): 339–371
|
[37] |
Roy A, Mihailovic I, Zwaenepoel W. X-stream: edge-centric graph processing using streaming partitions. In: Proceedings of the 24th Symposium on Operating Systems Principles. 2013, 472−488
|
[38] |
Perozzi B, Al-Rfou R, Skiena S. DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 701−710
|
[39] |
Ozdal M M, Yesil S, Kim T, Ayupov A, Greth J, Burns S, Ozturk O. Energy efficient architecture for graph analytics accelerators. In: Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture. 2016, 166−177
|
[40] |
Zhang M, Zhuo Y, Wang C, Gao M, Wu Y, Chen K, Kozyrakis C, Qian X. GraphP: reducing communication for PIM-based graph processing with efficient data partition. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture. 2018, 544−557
|
[41] |
Song L, Zhuo Y, Qian X, Li H, Chen Y. GraphR: accelerating graph processing using ReRAM. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture. 2018, 531−543
|
[42] |
Xie C, Yan L, Li W J, Zhang Z. Distributed power-law graph computing: theoretical and empirical analysis. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 1673−1681
|
[43] |
Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. 2012, 17−30
|
[44] |
Heidari S, Simmhan Y, Calheiros R N, Buyya R . Scalable graph processing frameworks: a taxonomy and open challenges. ACM Computing Surveys, 2019, 51( 3): 60
|
[45] |
Ham T J, Wu L, Sundaram N, Satish N, Martonosi M. Graphicionado: a high-performance and energy-efficient accelerator for graph analytics. In: Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. 2016, 1−13
|
[46] |
Yan M, Hu X, Li S, Basak A, Li H, Ma X, Akgun I, Feng Y, Gu P, Deng L, Ye X, Zhang Z, Fan D, Xie Y. Alleviating irregularity in graph analytics acceleration: a hardware/software Co-design approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 2019, 615−628
|
[47] |
Zhang S, Qin Z, Yang Y, Shen L, Wang Z . Transparent partial page migration between CPU and GPU. Frontiers of Computer Science, 2020, 14( 3): 143101
|
[48] |
Fan S, Fei J, Shen L . Accelerating deep learning with a parallel mechanism using CPU + MIC. International Journal of Parallel Programming, 2018, 46( 4): 660–673
|
[49] |
Yan M, Deng L, Hu X, Liang L, Feng Y, Ye X, Zhang Z, Fan D, Xie Y. HyGCN: a GCN accelerator with hybrid architecture. In: Proceedings of IEEE International Symposium on High Performance Computer Architecture. 2020, 15−29
|
[50] |
Yang T, Li D, Han Y, Zhao Y, Liu F, Liang X, He Z, Jiang L. PIMGCN: a ReRAM-based PIM design for graph convolutional network acceleration. In: Proceedings of the 58th ACM/IEEE Design Automation Conference. 2021, 583−588
|
[51] |
Yang H. AliGraph: a comprehensive graph neural network platform. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 3165−3166
|
[52] |
Zeng H, Prasanna V K. GraphACT: accelerating GCN training on CPU-FPGA heterogeneous platforms. In: Proceedings of 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2020, 255−265
|
[53] |
Kung H T. Why systolic architectures? Computer, 1982, 15(1): 37−46
|
[54] |
Liang S, Wang Y, Liu C, He L, Li H, Xu D, Li X . EnGN: a high-throughput and energy-efficient accelerator for large graph neural networks. IEEE Transactions on Computers, 2021, 70( 9): 1511–1525
|
[55] |
Auten A, Tomei M, Kumar R. Hardware acceleration of graph neural networks. In: Proceedings of the 57th ACM/IEEE Design Automation Conference. 2020, 1−6
|
[56] |
Liang S, Liu C, Wang Y, Li H, Li X. DeepBurning-GL: an automated framework for generating graph neural network accelerators. In: Proceedings of IEEE/ACM International Conference on Computer Aided Design. 2020, 72
|
[57] |
Song X, Zhi T, Fan Z, Zhang Z, Zeng X, Li W, Hu X, Du Z, Guo Q, Chen Y . Cambricon-G: a polyvalent energy-efficient accelerator for dynamic graph neural networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41( 1): 116–128
|
[58] |
Zhou Z, Shi B, Zhang Z, Guan Y, Sun G, Luo G. BlockGNN: towards efficient GNN acceleration using block-circulant weight matrices. In: Proceedings of the 58th ACM/IEEE Design Automation Conference. 2021, 1009−1014
|
[59] |
Arka A I, Doppa J R, Pande P P, Joardar B K, Chakrabarty K. RegraphX: NoC-enabled 3D heterogeneous ReRAM architecture for training graph neural networks. In: Proceedings of Design, Automation & Test in Europe Conference & Exhibition. 2021, 1667−1672
|
[60] |
Kiningham K, Levis P, Ré C . GRIP: a graph neural network accelerator architecture. IEEE Transactions on Computers, 2023, 72( 4): 914–925
|
[61] |
Chen C, Li K, Li Y, Zou X. ReGNN: a redundancy-eliminated graph neural networks accelerator. In: Proceedings of IEEE International Symposium on High-Performance Computer Architecture. 2022, 429−443
|
[62] |
Liu C, Liu H, Jin H, Liao X, Zhang Y, Duan Z, Xu J, Li H. ReGNN: a ReRAM-based heterogeneous architecture for general graph neural networks. In: Proceedings of the 59th ACM/IEEE Design Automation Conference. 2022, 469−474
|
[63] |
Lee Y, Chung J, Rhu M. SmartsAGE: training large-scale graph neural networks using in-storage processing architectures. In: Proceedings of the 49th Annual International Symposium on Computer Architecture. 2022, 932−945
|
[64] |
Sun Q, Liu Y, Yang H, Zhang R, Dun M, Li M, Liu X, Xiao W, Li Y, Luan Z, Qian D. CoGNN: efficient scheduling for concurrent GNN training on GPUs. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis. 2022, 39
|
[65] |
Sarkar R, Abi-Karam S, He Y, Sathidevi L, Hao C. FlowGNN: a dataflow architecture for real-time workload-agnostic graph neural network inference. In: Proceedings of the 29th IEEE International Symposium on High-Performance Computer Architecture. 2023, 1099−1112
|
[66] |
Yun S, Kim B, Park J, Nam H, Ahn J H, Lee E . GraNDe: near-data processing architecture with adaptive matrix mapping for graph convolutional networks. IEEE Computer Architecture Letters, 2022, 21( 2): 45–48
|
[67] |
Gustavson F G . Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM Transactions on Mathematical Software, 1978, 4( 3): 250–269
|
/
〈 | 〉 |