Blockchain based federated learning for intrusion detection for Internet of Things

Nan SUN, Wei WANG, Yongxin TONG, Kexin LIU

Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (5) : 185328.

PDF(13066 KB)
Front. Comput. Sci. All Journals
PDF(13066 KB)
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (5) : 185328. DOI: 10.1007/s11704-023-3026-8
Artificial Intelligence
RESEARCH ARTICLE

Blockchain based federated learning for intrusion detection for Internet of Things

Author information +
History +

Abstract

In Internet of Things (IoT), data sharing among different devices can improve manufacture efficiency and reduce workload, and yet make the network systems be more vulnerable to various intrusion attacks. There has been realistic demand to develop an efficient intrusion detection algorithm for connected devices. Most of existing intrusion detection methods are trained in a centralized manner and are incapable to identify new unlabeled attack types. In this paper, a distributed federated intrusion detection method is proposed, utilizing the information contained in the labeled data as the prior knowledge to discover new unlabeled attack types. Besides, the blockchain technique is introduced in the federated learning process for the consensus of the entire framework. Experimental results are provided to show that our approach can identify the malicious entities, while outperforming the existing methods in discovering new intrusion attack types.

Graphical abstract

Keywords

intrusion detection / federated learning / new attacks discovering / blockchain

Cite this article

Download citation ▾
Nan SUN, Wei WANG, Yongxin TONG, Kexin LIU. Blockchain based federated learning for intrusion detection for Internet of Things. Front. Comput. Sci., 2024, 18(5): 185328 https://doi.org/10.1007/s11704-023-3026-8

1 Introduction

In recent years, deep learning models such as convolutional neural networks (CNNs), transformers, and recursive neural networks (RNNs) have performed well in dealing with Euclidean structure data, obtaining outstanding predicted results. In practical situations, while deep learning has gained great success on Euclidean data, there is an increasing number of applications where data are generated from non-Euclidean domains and need to be analyzed. Non-Euclidean structure data, such as graph data, is a typical representation. Graph data contains abundant structural information and can acquire the properties of adjacent nodes to update its nodes. And its structure content caters to real world’s information propagation. As a result, to get more accurate prediction in graph data. Graph neural network (GNN) [1-25] comes into existence. GNN is a deep learning model to handle graph data, which makes machine learning applied into graph data structure. Then various GNN models are proposed, graph convolution networks (GCNs) [7-19,26-35], graph recurrent network (GRN) [4], graph attention networks (GATs) [2], graph auto-encoders (GAEs) [1], graph generative networks (GGNs) [3], etc. These GNN models play an important role in these applications, i.e., node classification, graph clustering, link prediction, node clustering, protein network and so on.
The execution of GNN models integrates the features of graph computation [36-46] and neural network, which aggregate the neighbor nodes’ information through the product of adjacent matrix and vertex property, and update the vertex’s property through multilayer perceptron (MLP). As a result, the GNN model exhibits hybrid execution patterns in computation and memory access. The aggregation phase features irregular memory access and computation, while the combination phase is vice versa. Moreover, the dataset of the GNN is composed of real-world graph data, which contains a significant number of nodes and edges. Therefore, as the depth of the model layers increases in the GNN, the amount of computation and irregular memory access cause more significant performance degradation and energy consumption, accounting for execution time and energy expenditure. Nevertheless, the accelerator designed for graph computation and general neural networks cannot be applied to handle this simultaneously. To address the aforementioned problems, there are numerous emerging studies focusing on GNN accelerators, as GCN is the most typical representation of GNNs and all variants of GNN are derived from GCN. The survey on GNN accelerators mainly focuses on those of GCN.
Because the special execution pattern on GNN, conventional architectures, i.e., CPU and GPU [47,48] is not fit for the work, occurring the problem of imbalance of workload, excessive memory access time, low energy efficiency,etc. The reason is explicit, real-world graphs often follow a power-law distribution in a sense that most of vertices are associated with a few edges. The irregularity in aggregation phase inherently falls short in exploiting memory- and instruction-level parallelism on traditional processors. Besides, the dimension of GNN’s node feature vectors can be very high, e.g., each node in Citeseer graph has 3703 features, which can lead to paramount processing costs in the combination phase. Consequently, lots of accelerating GNN works are in demand. The first work of accelerator on GNN application is HyGCN [49], which designs a two-phase architecture to accomplish two computation patterns respectively.
From the first work yielding, plenty of works addressing accelerating GNN are emerging. At present, the trend and opportunities of Domain-Specific Architecture (DSA) and emerging architecture, e.g., process in memory (PIM) is developing. Hardware platform templates, e.g., Field Programmable Gate Array (FPGA) and Application-Specific Integrated Circuit (ASIC), are in line with the demand of times. With regard to the acceleration strategy in the past few years, at the beginning, systolic arrays and ReRAMs are used frequently in general and PIM architecture respectively. Then, pipeline of stages and buffers on chips are also included. Besides, the co-design of algorithm and architecture, i.e., software and hardware, is the main stream of design plan. What’s more, in-storage architecture is also applied. Mostly, the process of adjacent matrix is often considered because it is the cause of abundant memory access and energy consumption. In a sense, most of work are on DRAM, even trying on cache with some methods. But graph data and GNN scale is still increasing, which take scalability of GNN into consideration.
To address above significant aspects and gaps, this paper provides a comprehensive and systematic overview and survey of GNN accelerators:
● We provide a unified framework to categorize the studies on GNN accelerators, which can sort out the architectures and methods.
● We give an introduction of GNN accelerators’ latest work to have a master of research directions.
● We provide a comprehensive overview of unique characteristics of GNN accelerators as well as challenges of it incurred by them.
● Lastly, open issues and prospects for GNN accelerators research are discussed.
The rest of this paper is organized as follows. Section 2 includes a detailed introduction to basic components of GNN and its accelerator, and briefly summarizes recent progress. At the same time, we give a classification and general framework of GNN accelerators. Challenges and characteristics are given in Section 3. So the solutions to the challenges are introduced in Section 4. What’s more, we make a review of previous work and give a direction on the research in Section 5. Finally, this paper concludes in Section 6.

2 Preliminaries

In this section, we give a brief introduction to the preliminaries of graph, GNN and its acceleration architecture, including graph representation and several common GNN models and architecture classification. Then, we summarize some unique characteristics of graph and GNN, followed by the related work of them on general-purpose processors. The characteristics of GNN and the related work further motivate our survey work on GNN accelerators. At last, we make a category framework of GNN accelerators to analyze them further.

2.1 Graph

Graph is a type of data structure consisting of vertices and edges further associated with them. A graph, expressed by G, can be generally defined as G=(V,E) where V represents the vertex set and E indicates the edge set. For a directed graph, an edge can be represented as e = (vi, vj), indicating that there is an edge pointing from vi to vj. To indicate E in an intuitive way, we use adjacent matrix, expressed by A, representing the neighbor relationships between vertices. We provide definitions of basic graph concepts. For easy retrieval, we summarize the commonly used notations in Appendix A.
In particular, a vertex and an edge can also be attributed with single or multiple properties. Real-world natural graphs, e.g., social networks, usually have the following three general features.
● Power-law distribution. Only a few vertices have associated most of the edges, leading to a severe workload imbalance issue with a large number of conflicts when high-degree vertices are being aggregated.
● Sparsity. The average number of vertex degrees is relatively small. The sparsity of graphs can result in poor locality for data accesses, which causes irregular memory access and computation.
● Small-world structure. Any two vertices in the graph can be connected with only a small number of hops. This feature will make it difficult for partitioning the graph efficiently.

2.2 Graph neural network

Different from CNN, GNN is a type of neural network, which learns from graph data structure and gets accurate predictions in non-Euclidean structure data. This is a general term for algorithms that use neural networks to learn graph structure data, extract and explore features and patterns in graph structure data, and meet the requirements of graph learning tasks such as clustering, classification, prediction, segmentation, and generation.
The previous GNN used RNN to deal with undirected graphs, directed graphs, label graphs and cyclic graphs. After that, it is believed that the first type of GNN is not able to deal with the complex and changeable graph data in reality. As a result, CNN model’s architecture is applied to the graph. Through the ingenious transformation of the convolution operator, it proposes a graph convolutional network (GCN), and derives many variants. GCN realizes the translation invariance, local perception and weight sharing of CNN on the graph, providing ideological guidance and reference for the construction and improvement of other GNN frameworks. Besides, all GNN models are based on GCN variants, so most of the accelerators work on GNNs is also based on GCN design, and the subsequent introduction and analysis in this survey are based on GCN.
GCN has two main methods for convolution operation: one is based on spectral decomposition, that is, spectral decomposition graph convolution, the other is based on node space transformation, that is, spatial graph convolution. Because the spectral GNN transforms all nodes in the graph at the same time, which requires high computing and storage space and is difficult to adapt to the analysis of large-scale graph data sets, the subsequent introduction and analysis in this survey are based on spatial GCN, which is main optimization model in accelerators work. The variant model is in Appendix B.
The existing mainstream graph neural network algorithm can be abstracted as a model:
hv=Aggregate(huN(v)(k1)),
hv(k)=Combine(hv(k1),hv),
where hv(k) denotes the feature vector obtained by updating node v in the kth layer. Overall, graph neural network is a hybrid of traditional graph computation and neural network. In the execution of each layer, the graph neural network first traverses the entire graph (or sampled subgraphs), and then aggregates the neighborhood information for each node. The node’s intermediate features hv in this layer are obtained by aggregating the information of each node. The process is similar to the traditional graph computation, as in Eq. (1). The node’s intermediate features are combined with the node’s features obtained in the previous layer. The node’s output features in this layer are obtained by updating vector hv(k), the process is similar to that of traditional neural networks, as in Eq. (2). And the entire process is in Fig.1.
Fig.1 The entire process of GNN [50]

Full size|PPT slide

2.3 Accelerator architecture

As one of the most popular research directions in artificial intelligence currently, GNN has produced a large number of different algorithms. The main general programming model is showed in Algorithm 1. Many people have carried out research and development work on graph-oriented neural network framework and extension library based on a common platform. Because graph neural network applications and traditional neural network applications have similarities and differences in many execution methods, many existing work is based on the mature neural network framework, For example, PyTorch and Tensorflow are extended to form a new framework supporting the application of graph neural network. These frameworks and extension libraries support a variety of different graph neural network algorithms, most of which have been open source, so that users can flexibly construct GNN on them. The mainstream three graph neural network frameworks and extension libraries will be introduced below.

Full size|PPT slide

PyTorch Geometric (PyG), is the most commonly used general graph neural network extension library at present. It is based on the PyTorch framework, and supports running on CPU and GPU platforms. It has been open source. In addition to common graph structure data processing methods, PyG also provides support for methods such as relational learning and 3D data processing. PyG provides users with a universal message passing interface, enabling users to implement new research ideas on graph neural network algorithms on PyG.
DeepGraph Library (DGL) is an open source graph neural network extension library, which is based on a variety of existing neural network framework extensions, and currently supports Tensorflow, PyTorch and MXNet, And minimize the workload when users migrate the graph neural network model across platforms. DGL abstracts the calculation process of the graph neural network as a user-configurable message delivery unit, and extracts the connection between the sparse matrix multiplication in the graph neural network and the message delivery mechanism, The calculation operation is integrated into generalized sparse dense matrix multiplication (g-SpMM) and generalized sampling dense matrix multiplication (g-SDDMM). In addition, DGL also introduces different types of parallel strategies, by adopting node-level parallelism for gSpMM and edge-level parallelism for g-SDDMM, making it have high execution speed and memory access efficiency.
AliGraph [51] is an open source distributed framework for the large-scale graph neural networks released by Alibaba Group. AliGraph’s system is divided into operators Sampling and data storage are three levels. The operator level decomposes different graph neural network algorithms into three operators in the system: sample, aggregate and combine, and performs optimization calculations; The sampling layer integrates a variety of different sampling strategies to enable it to generate training samples quickly and accurately; The data storage layer implements efficient data organization and storage through flexible graph partitioning, separating storage of different attributes in the graph, and caching frequently used data.
After introducing the software aspects of GNN accelerator, the hardware hierarchy is also considered necessarily. Because of the special phase like GPA, the main hardware platforms of GNN accelerators have three types, which are ASIC, FPGA (CPU-FPGA), and emerging devices such as PIM. General processors, i.e., CPU and GPU, cannot accelerate the GNN model directly because of the two-phase execution pattern, which needs dynamic computation and irregular memory access in aggregate stage and vice versa in combine stage. As for the accelerators hardware architecture, there are centralized and separate hardware design plans. The typical representation for separate designs are HyGCN, GraphACT [52] for accelerating aggregate and combine stage separately, HyGCN is an early proposed two-segment GNN inference accelerator using ASIC, where the two main computational modules perform the aggregation operation and the combination operation in GNN, respectively. The aggregation part consists of an access controller containing sparse elimination and multiple Single Instruction Multiple Data (SIMD) cores. “Multiple single-instruction multiple-data cores can be flexibly configured to aggregate at high utilization. The combination part is a matrix multiplication unit composed of multiple configurable Systolic Array (SA) [53] architectures, which can work independently or in concert. GraphACT proposes a GNN training system implemented by a host CPU+FPGA accelerator. The host part, in addition to executing the softmax function and other results and the calculation of the loss function during training, also preprocesses the GNN task to be computed by an elaborate matching algorithm, arranges the data to be computed in the most conducive way for reuse, and transfers them to the memory on the FPGA board; the FPGA part contains two major hardware function modules, which perform vector and matrix computation, and the corresponding on-chip cache. EnGN [54] and AWB-GCN [26] are the centralized type, which gather the two-stage computation pattern in the same hardware for execution. AWB-GCN focuses mainly on the problem of load imbalance in sparse matrix multiplication computation, three techniques of distribution smoothing, remote swapping, and special row remapping are proposed for load balancing, and corresponding data path support is designed in hardware to improve the utilization of structurally uniform PE units. Since the sparse matrix of the sparse matrix multiplication computation in GNN is determined by the tasks, and the sparsity of each task is different, and it also proposes an auto-tuning module to adjust the deployment strategy of the next cycle by the current cycle execution, so that the hardware utilization can be improved by auto-tuning when the GNN algorithm is run several times on the same data set. EnGN, this work proposes a Neural Graph Processing Unit (NGPU) to perform sparse matrix or dense matrix multiplication computations in GNNs. Specifically, the NGPU consists of a two-dimensional array of interconnected Processing Element (PE), each of which contains a multiplicative accumulation computation unit, a degree aware feature cache, and a weight cache to better exploit the data time reuse property. In addition, each column of PEs is connected by a topology called Ring Edge Reduce (RER), and it is through this data path that EnGN performs sparse matrix operations to aggregate the feature data to be aggregated through several cycles of data transfer, in conjunction with preprocessing of graph data. PIM, as an emerging technology used in GNN to accelerate computation and memory access, e.g., ReRAM, which uses crossbar as resistance, wordline as voltage from input to DAC, bitline as current from the output of voltage and resistance, gathering the multiplication and accumulation to ADC. This detailed introduction will be in Section 4. The present accelerator hardware architecture will be displayed in Tab.1.
Tab.1 Current GNN acceleration architectures
Name Model supported Stage optimized Platform Hybrid or uniform
HyGCN GCNs Inference ASIC Hybrid
Auten et al. [55] GCNs, GATs Inference ASIC Hybrid
GraphACT GCNs Training CPU-FPGA Hybrid
DeepBurning-GL [56] GCNs Inference FPGA Hybrid
AWB-GCN GCNs Inference ASIC Uniform
GCNAX [28] GCNs Inference ASIC Uniform
Cambricon-G [57] GCN,GraphSAGE Both ASIC Uniform
BlockGNN [58] GCNs,GAT Inference FPGA Uniform
ReGraphX [59] GNNs Training PIM Uniform
I-GCN [30] GCNs Inference ASIC Hybrid
GRIP [60] GCN, GraphSAGE, GIN Inference ASIC Hybrid
EnGN [54] GCNs, GRN Inference ASIC Uniform
Huang et al. [27] GCNs Inference PIM Uniform
GCOD [29] GCNs Inference ASIC Uniform
ReGNN [61] GNNs Inference ASIC Hybrid
ReGNN [62] GCNs Inference PIM Hybrid
Graphite [31] GNNs Both CPU Uniform
SmartSAGE [63] GraphSAGE Training CPU-FPGA Uniform
CoGNN [64] GNNs Training GPU Uniform
PASGCN [34] GCNs Inference PIM Uniform
FlowGNN [65] GNNs Inference ASIC Hybrid
SGCN [33] GCNs Inference ASIC Hybrid
GROW [32] GCNs Inference ASIC Uniform
GraNDe [66] GCNs Inference ASIC Hybrid
GNNAdvisor [25] GNNs Both GPU Uniform
GNNLab [20] GNNs Training GPU Uniform
Degree-Quant [22] GNNs Both CPU-GPU Uniform
Flexgraph [21] GNNs Training CPU Uniform
SGQuant [23] GNNs Both Memory-constrained Devices Uniform
QGTC [24] GNNs Both GPU Uniform
Xu et al. [35] GCNs Training GPU Uniform
Table 1 displays present GNN accelerators’ holistic architecture parameters, including the model supported, stage supported, platform and hardware design supporting computation phase, which is two phases separately or simultaneously.

2.4 Categories and frameworks

With regard to GNN accelerators, we conduct a systematic review on them. It aims at exploring the key issues in the design and implementation of GNN accelerators. As summarized in Fig.2, we have identified a complete set of core components for GNN accelerators, which involves three major aspects: preprocessing, computation model, and scheduling strategies.
Fig.2 The design aspects of GNN accelerators

Full size|PPT slide

Preprocessing A GNN accelerator mostly has few storage resources in actual life. Graphs are needed to be partitioned. Preprocessing is an important part that implements on graph data (e.g., adjacent matrix) for trying to make the graph data fit into the memory capacity of the GNN accelerator. It is also the key to match a certain processing model and appropriate hardware resource before the formal processing. What’s more, this can enhance data locality, for better regular memory access and more swift data communications.
Computation model The GNN computation model serves as the main execution part of GNN accelerator design. Parallelism often consists of vertex-intra and vertex-inter parallelism, which belongs to aggregate phase. The order of computation is also considered in it because the dimension of vertex property is different in respective phase. The implementation of this part generally relies on some hardware platform, e.g., FPGA, ASIC, or Processing-In-Memory (PIM). Different specifications have different concerns on hardware designs and sophisticated software co-designs for high throughput and energy efficiency.
Scheduling strategies This part aims at how to schedule a large number of operations and data on a limited set of hardware resources of GNN accelerators. The scheduling component often involves data communication, execution mode, and scheduling scheme to get a higher utilization of resources for solving imbalance workload problems.

3 The characteristics and challenges

3.1 GNN architecture characteristics

From the above sections, we have a basic knowledge of GNN accelerators. In summary, we outline three typical features of GNN architecture characteristics. They are two-phase computations execution mode, complex dataset and poor scalability.
Two-phase computations execution mode As proposed before, GNN model is composed of two phases in every iteration, aggregate and combine. Aggregate phase is like graph computation execution, gathering the nodes’ property according to the linking edges information to get the node’s new features, i.e., message passing mechanism. Combine phase is like neural networks computation, passing weight, bias and activation functions to update node’s new features. As a result, GNN possesses graph computation and neural networks characteristics simultaneously, which cannot be efficiently implemented on conventional computational resources. We can learn from Tab.2, that GNN accelerator should satisfy two opposite computation and memory access execution mode in one architecture system.
Tab.2 Execution behaviors in GNNs [49]
Aggregation Combination
Access pattern Indirect and Irregular Direct and regular
Data reusability Low High
Computation Pattern Dynamic and Irregular Static and regular
Computation Intensity Low High
Execution Bound Memory Compute
Complex dataset To further know the GNN features, we show the dataset of general GNN model in Tab.3. According to the table, we can learn that the dataset of GNN consists of considerable edges and nodes with multiple-dimension features, which is different from GPA’s one-dimension nodes. What’s more, following the power-law distribution, the adjacent matrix of the graph’s sparse degree is up to 99.9%, which makes GNN computation and memory access dynamic and irregular, causing more execution time and energy consumption, as is shown in Fig.3. The weight in GNN is dense and regular, which lets GNN keep a balanced computation in a dilemma.
Tab.3 Current common GNN dataset
Name Nodes Edges Features Storage Classes
Pummbed(PB) 19717 88648 500 38MB 3
Cora(CR) 2708 10556 1433 15MB 7
Citseer(CS) 3327 9104 3703 47MB 6
Reddit(RD) 232965 11465892 602 1.8GB 41
NELL(NE) 65755 266144 5414 1.3GB 210
Fig.3 The non-zero elements of GNN adjacent matrix (a) and weight matrix (b) [26]

Full size|PPT slide

Poor scalability Since GNN has the features that is similar to CNN, why not use multiple layers to get more accurate predictions? As is shown in Fig.4, one layer represents one hop neighbor information gathering, if we iterate considerable layers to get more hops information, the amount of computation and memory access is huge, which is bound to platform and also causes overfitting, severe unbalanced workload, etc.
Fig.4 An example of inference on vertex B using a two-layer GCN. The nodeflow describes the propagation of features within a message-passing layer (MPL) [60]. (a) Input graph; (b) nodeflow; (c) inference dataflow

Full size|PPT slide

3.2 GNN accelerators design challenges

On the basis of GNN various features, there are five main challenges that happen in designing GNN accelerators.
Parallelism In GNN model, the parallelism used is mainly in aggregate phase. As is shown in Fig.5. Aggregate phase accounts for the execution time. Fig.5 illustrates their execution time ratio on different models and datasets. Their execution times differ due to the variable length of feature vectors and the execution flow of GCNs. To accelerate GNN, we can utilize vertex-intra and vertex-inter parallelism to improve the computation performance, which is shown in Fig.1. Vertex-intra parallelism is to gather a destination node’s all neighbors properties to aggregate in parallel. Vertex-inter parallelism is to process every destination node’s aggregate phase simultaneously. Nevertheless, to efficiently make use of these two parallelism is a problem. Especially in PIM architecture [34], the two mode parallelism is bounded by the crossbar utilization. What’s more, leveraging both parallelism is also a challenge on software and hardware level, because these two parallelisms needs to be compromised on the variable length of feature vectors and the execution flow of GCNs.
Fig.5 Execution time breakdown of the two phases [49]

Full size|PPT slide

Data locality As is proposed before. The adjacent matrix A is sparse and random, which has a poor locality, letting aggregate phase occur considerable dynamic computation and irregular memory access. So we hope to find a way to enhance the locality, decreasing the energy consumption caused by memory access. Data locality is mainly about adjacent matrix’s optimizations in existing works, because there are lots of zeros in graph’s adjacent matrix following power-law distributions, which results in plenty of useless computations. Yan et al. [49] adopts dynamic sparsity elimination to reduce redundant accesses, which is interval-shard partition, interval-wise feature access, window sliding and window shrinking. However, the operation complexity is of great height.
Data reuse Data reuse mainly focuses on node features’ reuse, and graphs following power-law distributions have such characteristics that there are always few nodes having most neighbors while most nodes having few neighbors. To this end, if these few nodes and their neighbors’ features can be reused in GNN computation, much redundant storage and computation resources can be saved. What’s more, the aggregate phase will have much redundancy in computation and communications, which lots of nodes features will be reused for aggregate for many times. As a result, if we can improve the reuse of nodes’ property in GNN, the two phases will be both accelerated and the pressure on cache is also reduced. But the dataset of GNN is big and sparse, it’s challenging to deal with it.
Utilizations of resources Apart from sparsity in adjacent matrix, when executing GNN, imbalance of workload is an obvious problems, that is, some vertices have more neighbors while some have only a few, so when executing aggregate phase, the computation resource will be occupied by more neighbors’ nodes. Hence, according power-law distribution, there will be lots of process elements idle in workload while others busy, causing time-consuming results. We need the resources scheduling in computation to address it.
Single stage optimized From Tab.1, we can see that most GNN accelerators only support accelerating training stage or inference stage, only a few accelerators support holistic process. This is because lots of work make optimizations on partial stage due to the different computation patterns in GNN training and inference. Accelerators supporting training and inference stages in GNN need to consider and design more complicated algorithms and architectures. To be exact, GNN accelerator is different from CNN’s, which only think of forward inference and backward propagation in each iteration. GNN accelerator needs to consider aggregate phase in training, which is another big challenge.

4 Detailed works

From the above mentioned challenges, we will introduce some related existing works to handle the problems in detail. There are three aspects as follows.

4.1 Optimizations on memory

We can see the work Graphite [31], it still uses CPU to accelerate GNN, analyzing the execution behavior on CPU. As is shown in Fig.6, the main reason of performance bottleneck is memory-bound, only 10% of the pipeline slots are attributable to useful work. Further, in 62% of the slots, the pipeline is stalled waiting for loads and stores. Since the aggregate performs a simple reduction for each vertex after gathering its neighbors’ feature vectors, the operation is highly memory-intensive. Therefore, Graphite replaces CPU with DMA(direct memory access) on memory-intense phase, reducing main memory bandwidth pressure appears to be key to optimize GNN workloads on CPUs.
Fig.6 Breakdown of the pipeline slots spent on retiring micro-ops or stalled by different bottlenecks during a full-batch training of GraphSAGE on a CPU [31]

Full size|PPT slide

GNNAdvisor [25], an efficient runtime system to systematically accelerate GNN applications on GPUs, as is shown in Fig.7. The work leverages the performance gains from the GNN input-level properties, such as communities in graphs, node degree and dimensionality of node embedding to implement node renumbering, group-based workload partitioning, and dimension-based workload sharing. These aforementioned optimizations are tailored for GNN computation to balance intra-thread efficiency and inter-thread parallelism, reducing the high-overhead memory operations respectively. Besides, a Modeling and Estimating strategy to automate the optimization process with minor manual efforts is accomplished by a set of performance-related parameters for user tuning flexibility.
Fig.7 Overview of GNNAdvisor [25]

Full size|PPT slide

GNNLab [20], a sample-based GNN training system in a single machine multi-GPU setup. A new factored space sharing design for sample-based GNN training that reduces intra-task resource contention and unleashes inter-task data locality, and solutions to tackling the imbalanced load issues introduced by the factored design. A general GPU-based feature caching scheme, as well as a caching policy based on pre-sampling that is robust to diverse sampling algorithms and GNN datasets, which makes an optimization on GNN GPU memory. The insights of these challenges and characteristics is in Fig.8.
Fig.8 A breakdown of memory usage and data similarity for different stages of the SET model when training OGB-Papers over multiple GPUs (G0, G1,...) with 16 GB of memory each [20]

Full size|PPT slide

Another work SmartSAGE [63], training large-scale GNN using in-storage processing architectures (ISP), is to train large-scale dataset in real productivity and life. State-of-the-art (SOTA) ML frameworks employ an in-memory processing model which significantly hampers the productivity of ML practitioners as it mandates the overall working set to fit within DRAM capacity. Hence, they explore the feasibility of utilizing capacity-optimized NVMe SSDs for storing memory-hungry GNN data, which enables large-scale GNN training beyond the limits of main memory size.
To reduce data redundancy in aggregate phase, ReGNN [61], proposes a dynamic redundancy-eliminated neighborhood message passing algorithm for GNNs. The work utilizes the redundancy existing in aggregate phase, as is shown in Fig.9. This work designs a dynamic redundancy-eliminated neighborhood message passing algorithm for GNN to find the redundancy sets in advance, then by data reuse in redundancy sets to reduce memory pressure, ReGNN also uses hardware to support the algorithm. The holistic architecture is shown in Fig.10, which can be reconfigured by adjust the corresponding switches.
Fig.9 Illustrative examples of EdgeUpdate redundancy (ER) and Aggregation redundancy (AR) [61]

Full size|PPT slide

Fig.10 Overview of ReGNN [61]

Full size|PPT slide

I-GCN [30] and GCoD [29] are two typical works using data locality to accelerating GNN. I-GCN uses a mechanism called islandization, which is a new online graph restructuring algorithm clustering of nodes with strong internal but weak external connections. The islandization process yields two major benefits. First, by processing islands rather than individual nodes, there is better on-chip data reuse and fewer off-chip memory accesses. Second, there is less redundant computation as aggregation for common/shared neighbors in an island can be reused. This is done without any preprocessing of the graph data or adjustment of the GCN model structure. Experimental results show that I-GCN can significantly reduce off-chip accesses and prune 38% of aggregation operations. The whole process is in Fig.11.
Fig.11 Overview of I-GCN [30]

Full size|PPT slide

GCoD [29] integrates a split and conquer GCN training strategy, which polarizes the graphs to be either denser or sparser in local neighborhoods without affecting the model accuracy, letting graph adjacency matrices have merely two types of workload and benefits from largely improved regularity and thus ease of acceleration. The overview is in Fig.12. GROW [32] treats GCN as a sparse-dense matrix multiplication, so it utilizes an uniform architecture to inference GCN. The main purpose of this work is to employ a row-stationary dataflow based on the row-wise product matrix multiplication (i.e., Gustavson’s algorithm [67]), allowing both flexible and fine-grained adaptation to the two heterogeneous sparsity patterns, which can address inefficient data movements and significantly reduces the memory bandwidth waste. Besides, it puts forward a multi-row stationary runahead execution model to maximize memory-level parallelism and overall throughput. The architecture of GROW is exhibited in Fig.13.
Fig.12 Overview of GCOD [29]

Full size|PPT slide

Fig.13 Overview of GROW [32]

Full size|PPT slide

SGCN [33] exploits compressed-sparse features existing in input feature to compress data which is suitable for GCN.The work optimizes compressing model not only in aggregate phase most work focus on but also in combine phase to reduce off-chip memory traffic. To better tackle data locality in the existence of the varying sparsity, SGCN proposes sparsity-aware cooperation. The basic view of SGCN is displayed in Fig.14. The work mainly pays attention to GCNs with deeper, more residual layers, which appears a huger potential from the intermediate feature sparsity.
Fig.14 Overview of SGCN [33]

Full size|PPT slide

4.2 Optimizations on computation

The accelerating computation in GNN is focused on scheduling scheme, dataflow, graph tiling, graph partitioning and graph reordering. GraphACT [52], AWB-GCN [26], GCNAX [28], EnGN [54], and GRIP [60] are typical works for these.
GraphACT proposes a GNN training system implemented by a host CPU plus an FPGA gas pedal. The host part, in addition to performing results such as the softmax function and the calculation of the loss function during training, also preprocesses the GNN task to be computed by an elaborate matching algorithm, arranges the data to be computed in a way that is most conducive to reuse, and transfers them to the memory on the FPGA board; the FPGA part contains two major hardware function modules, which perform vector computation and matrix computation, as well as the corresponding on-chip cache. Since the work is used in a training scenario, the preprocessing on the host is able to be performed in parallel with the task computation on the FPGA gas pedal, so it is more efficient. Fig.15 shows the timing flow diagram of the system for the GNN training task.
Fig.15 Per-minibatch scheduling between CPU and FPGA (up), and between FPGA computation modules (down) [52]

Full size|PPT slide

AWB-GCN focuses on the problem of load imbalance in sparse matrix multiplication computation, proposing three techniques of distributed smoothing, remote swapping, and special row remapping for load balancing, and designing the corresponding data path support in hardware, so as to improve the utilization of PE units with a unified structure. Since the sparse matrix of the sparse matrix multiplication computation in GNN is determined by the task, and the sparsity of each task is different, this paper also proposes an auto-tuning module to adjust the deployment strategy of the next cycle by the current cycle execution, so that the hardware utilization can be improved by auto-tuning when the GNN algorithm is run several times on the same data set.
GCNAX proposes a flexible and optimized dataflow for GCNs that simultaneously improves resource utilization and reduces data movement. This is realized by fully exploring the design space of GCN dataflows and evaluating the number of execution cycles and DRAM accesses through an analysis framework. As is shown in Fig.16, GCNAX considers the operations in a GNN layer with opposite order, to further research the dataflow and execution mode of the GNN.
Fig.16 The number of operations for the five datasets (first layer) using the two execution orders [28]

Full size|PPT slide

FlexGraph [21] makes a classification of GNN according to neighborhood definitions and aggregation patterns, indicating training GNN models’ key expressivity and performance challenges of GNN frameworks. The work uses DNFA (direct neighbors with flat aggregation), INFA (indirect neighbors with flat aggregation) and INHA (indirect neighbors with hierarchical aggregation) to define GNN model based on both static and operational characteristics. What’s more, it has a programming abstraction model abstraction NAU (NeighborSelection, Aggregation and Update) which support the above categories. Then it leverages a mixed execution scheme for hierarchical aggregation, as well as an application-driven workload balancing strategy and a pipeline processing strategy for efficient distributed training, which makes an optimization on GNN computation. The entire process is in Fig.17.
Fig.17 FlexGraph architecture [21]

Full size|PPT slide

To accelerate backward propagation’s aggregate computation in GCN training, Xu et al. [35] proposes an execution path preparing approach to reduce computations in training after discovering that GCN training can be equivalently changed to a partially-active graph processing procedure from graph processing’s angle. As is shown in Fig.18. What’s more, it proposes a structure-aware way on computing the optimal group sizes for the execution paths after studying the effective group sizes on performing backward aggregation in different execution path. At the same time, it compares its work with GNNAdvisor in GCN trainings, and it make a better performance.
Fig.18 Execution paths of backward aggregations in two layers on the example graph [35]

Full size|PPT slide

GRIP emphasizes the concept of Nodeflow: in the inference task of GNN, it is usually not necessary to calculate the output of all nodes, but only a few nodes, and from a few output nodes through the connection relations of the graph to invert the input nodes and their connection relations of each layer of GNN, which constitutes a sub-graph of minimum computation. GRIP designs corresponding multiple caches and corresponding prefetching modules to process Nodeflow data, which are sent to multiple units performing aggregation tasks of multiple nodes in parallel by means of cross-array; the node feature transformation part is accelerated by weighted chunking caches, and the overall architecture is shown in Fig.19. This work can support multiple GNN variants by combining the data flow between each intermediate cache, and the spatial reuse of data is achieved as much as possible through multiple caches and computational units, although the cross-array is not optimized enough and temporal reuse of data is less considered.
Fig.19 Overview of GRIP [60]

Full size|PPT slide

To accelerate quantized graph neural networks via GPU Tensor Core. QGTC [24] imports a novel quantized low-bit arithmetic design based on the low-bit data representation and bit-decomposed computation, which can support QGNN with diverse precision demands. It also leverages subgraph partitioning and batching, and zero-tile jumping to optimize computation, built on top of the GPU Tensor Core. The main process in shown in Fig.20. We can see that QGTC make the implementations on software, hardware, GPU kernel, framework and algorithm level.
Fig.20 The main framework of QGTC [24]

Full size|PPT slide

Another quantization work on GNN is Degree-Quant [22]. By quantization-aware training for GNNs, it designs a method according to degree’s characteristics and makes some optimizations. The work discovers and explains the sources of accuracy degradation in GNNs when using lower precision arithmetic. As is shown in Fig.21. Then it analyses how it chooses straight-thought estimator (STE) implementation, node degree and method for tracking quantization statistics during training impacts performance, which is comprehensive in theory and gain good performance in INT8 and INT4 data precision computations.
Fig.21 High-level view of the stochastic element of Degree-Quant. Masked (high in-degree) nodes, in green, operate at full precision, while unmasked nodes (red) operate at reduced precision. High in-degree nodes contribute most to poor gradient estimates, hence they are stochastically masked more often [22]

Full size|PPT slide

SGQuant [23] is a typical work of quantization optimizing on memory. The work investigates the multi-granularity quantization strategy that operates at different levels (components, graph topology, and layers) of GNN computation. Moreover, we offer an automatic bit-selecting (ABS) to pinpoint the most appropriate quantization bits for the above multi-granularity quantizations. As is shown in Fig.22. And it proposes a multi-granularity quantization featured with component-wise, topology-aware, and layer-wise quantization to meet the diverse data precision demands. What’s more, an end-to-end bits selecting in an automatic manner is introduced that makes the most appropriate choice for the aforementioned different quantization granularities.
Fig.22 Multi-Granularity Quantization: (a) Component-wise, (b) Topology-aware, (c) Layer-wise, and (d) Uniform Quantization. NOTE: the same color represents the same quantization bit [23]

Full size|PPT slide

EnGN is a typical representative of unified architecture, and this work proposes a NeuralGraph Processing Unit (NGPU) to perform sparse matrix or dense matrix multiplication computation in GNN. Specifically, NGPU consists of a two-dimensional array of interconnected Processing Element (PE), each PE contains a multiplicative accumulation computation unit, a degree aware feature cache, and a weight cache, which can better exploit data time reuse property. In addition, each column of PEs is connected by a topology called Ring-Edge-Reduce (RER), and it is through this data path that EnGN performs sparse matrix operations, and with the preprocessing of graph data, aggregates feature data to be aggregated through several cycles of data transfer. The architectural diagram of this work is shown in Fig.9. Its advantages are that the degree aware cache reduces the number of accesses and 2D array structure has better circuit characteristics, while its shortcomings are that the aggregation operation of GNN is not well supported and there is an additional overhead for aggregation in the RER structure.
FlowGNN [65] considers the real-time GNN inference, which cannot ensure the workload’s amount, i.e., agnostic workload. The fundamental architecture is depicted in Fig.23. The improved architecture can simultaneously process multiple nodes and edges, enabled by an NT-to-MP adapter via on-the-fly multicasting. FlowGNN analyzes dataflow in GNN model, indicating limitations existing in accelerating architecture. Consequently, it proposes an explicit message passing mechanism that improves the generality of GNN accelerator dramatically and multi-level parallelism (inter-node, intra-node, inter-edge across both message passing and node transformation) via multi-queue-based dataflow that can notably enhance effect without losing generality.
Fig.23 Overview of FlowGNN [65]

Full size|PPT slide

4.3 Optimizations on emerging technologies

The process in memory (PIM) architecture is booming in memory-intensive situation, ReRAM as a representation, is mainly composed of wordline, bitline, DAC, ADC, etc., which can be used in GNN’s aggregate acceleration. As is shown in Fig.24, (a) is the basic construction of ReRAM, and (b) is the work flow of it. This special architecture can be used to accelerate matrix-vector multiplications (MVM) efficiently. The weight matrix is generally put on resistance crossbars, and input data passes through wordline becoming voltage by DAC, then bitline accumulates the current from the product of voltage and electrical conductance, passing the ADC to return to result data. At present, the typical PIM works on GNN are Huang et al. [27], ReGNN [62] and PASGCN [34,50]. Huang et al. [27] proposes a new PIM accelerator called REFLIP. As is shown in Fig.25. First, REFLIP adopts ReRAM crossbar architectures to build a unified architecture for supporting the two GNN phases. Second, REFLIP adopts the flipped-mapping scheme that can reduce the sparsity of adjacent matrix and improve the massive crossbar-structured parallelism, which exchanges the position of weight and input. The scheme is shown in Fig.26.
Fig.24 The ReRAM architecture [28]

Full size|PPT slide

Fig.25 The REFLIP architecture [27]

Full size|PPT slide

Fig.26 Comparing and contrasting the classic graph mapping and our flipped mapping for GCNs. (a) A sample graph, (b) traditional graph processing mapping scheme that maps sparse edge data into crossbar arrays, and (c) the flipped-mapping scheme in REFLIP that maps multi-dimensional vertex features into crossbar arrays. The notation ei,j represents an edge pointing from the vertex i to the vertex j [27]

Full size|PPT slide

ReGNN [62] is composed of analog PIM (APIM) and digital PIM (DPIM) modules for accelerating MVM operations and non-MVM aggregate operations respectively. ReGNN maps data to aggregation sub-engines based on the degree of vertices and the dimension of feature vectors to increase data parallelism, the architecture of ReGNN is shown in Fig.27.
Fig.27 Overview of ReGNN [62]

Full size|PPT slide

Latest works on GNN is PASGCN [34,50], This work fully utilizes parallelism in vertex-intra and vertex-inter by employing dense data mapping and a search-execute architecture with acceptable crossbars cost, as is shown in Fig.28. And there are also two scheduling strategies for maximizing the vertex-inter parallelisms. Optimal scheduling is reduced to a maximum independent set problem, which is solved by an algorithm called node-grouping. Besides, this work explores the task-irrelevant information in graphs and proposes an adaptively sparsified GCN network, which is named as ASparGCN. ASparGCN exploits a multilayer perception (MLP)-based edge predictor to get edge selection strategies for each GCN layer separately and adaptively in training stage, and only inferences with the selected edges in test stage. The whole architecture is shown in Fig.29.
Fig.28 GCN design with the two design patterns. (a) Adjacent matrix; (b) GCN design with general MM (GEMM) crossbars; (c) GCN design with CAM crossbars and MAC crossbars [34]

Full size|PPT slide

Fig.29 Overview of PASGCN [34]

Full size|PPT slide

GraNDe [66] uses near-data processing architecture to map matrix adaptively for GCN. The architecture of it is displayed in Fig.30. The aggregate phase is separated from general processing units by NDP modules, which are located on the DIMM’s buffer chip, one for each rank. The paper locates processing elements (NDP modules) near the DRAM data path to exploit rank-level parallelism, which is at a buffer device located in the middle of the DRAM datapath, multiple ranks can concurrently transfer data to the processing units. Besides, by exploring the data mapping of the operand matrices to DRAM ranks, it is discovered that the optimal mapping effect differs depending on the specific GCN layer’s configuration information.
Fig.30 Overview of GraNDe [66]

Full size|PPT slide

5 Future directions and reviews

On the basis of considerable surveys on GNN accelerators, we learn the characteristics of GNN and challenges when designing GNN accelerators, and related works to address the problems. We make the review of these works. In non-PIM architecture, most works are done on adjacent matrix, which is sparse and following power-low distribution, such as enhancing locality, reducing data redundancy, etc. What’s more, these works also think of imbalance of workload problems on their hardware design or through scheduling schemes to improve the percentage of resource utilization. Besides, there are also works on optimizations on dataflow (nodes flow). In PIM architecture, computation efficiency is optimized by its self construction, so the core works are on parallelism and balanced resource utilization. To be exact, main works are on aggregate phase, which is the performance and energy consumption bottleneck of GNN.
From our perspective, there are some aspects we think being done later.
● Previous hardware accelerating works mainly accelerate only one of the two phase in GNN, especially aggregate phase, which is key bottleneck of GNN. We should build a holistic view to design a GNN hardware accelerators supporting all phases, apart from between the two phases. The sentence “Within these stages” includes optimizations on aggregation, combination and both themselves. The sentence “Between these stages” means optimizations on the dataflow, communications and buffer mechanisms, in other words, between aggregation and combination.
● Data compression [22-24]. Compressing the data can bring notable performance improvement. And we can explore the data precision and model prunings on GNN’s effect.
● Hardware accelerating method of traditional platforms. Although most works are on CPU [31] or GPU [64] these traditional platforms, there are still only a few works on hardware aspects, we can still explore the balanced workload, performance-resource model [56,58], and substitute elements compensating for the drawback of traditional algorithm and software platforms.
● Further optimizations on adjacent matrix. We can explore the adjacent matrix locality further using more accurate method to keep the useful information.
● Explorations on PIM. ReRAM is a successful work on GNN platform. Therefore, we can explore more works on PIM architecture specially for GNN accelerations.
● Dataset characteristics analysis. From previous works, there are some works [25,29] on dataset processing. But there still have more characteristics to be excavated in GNN datasets, for example, real social network datasets following power-law distribution or other datasets which should be further searched for And we think we can optimize GNN on various realistic dataset.

6 Conclusions

This paper first introduces the basics of graph neural network applications, common algorithms, application scenarios, programming models, and mainstream frameworks and extension libraries based on common platforms. Then it presents a detailed and systematic analysis and introduction of the key optimization techniques in this field from the overall structure design, computation, memory, and emerging technologies, taking the accelerated structure design challenges brought by the execution behavior of graph neural networks as starting point. To face the challenges, we introduced the related works to handle these issues. What's more, this paper presents the key optimization techniques and future directions in the design of accelerated structures for GNN, and provides a clear understanding of current state of research in this field and inspires researchers in the design of accelerated structures.
This is a preview of subscription content, contact us for subscripton.

Nan Sun received BS degree in Detection Guidance and Control from Northwestern Polytechnical University, China in 2018 and MS degree in Aeronautical and Astronautical Science and Technologyfrom Harbin Institute of Technology, China in 2020, respectively. He is currently pursuing the PhD degree with the School of Cyber Science and Technology, Beihang University, China. His research interests include attack detection and federated learning

Wei Wang received her BEng degree in Electrical Engineering and Automation from Beihang University, China in 2005, MSc degree in Radio Frequency Communication Systems with Distinction from University of Southampton, UK in 2006 and PhD degree from Nanyang Technological University, Singapore in 2011. From January 2012 to June 2015, she was a lecturer with the Department of Automation at Tsinghua University, China. Currently, she is an associate professor with the School of Automation Science and Electrical Engineering at Beihang University and supported by BUAA Young Talent Recruitment Program. Her research interests include adaptive control of uncertain systems, distributed cooperative control of multi-agent systems, fault tolerant control, secure control of cyber-physical systems, and flight control systems

Yongxin Tong received the PhD degree in Computer Science and Engineering from The University of Hong Kong of Science and Technology, China in 2014. He is currently a professor in the School of Computer Science and Engineering, Beihang University, China. His research interests include federated learning, privacy preserving data analytics, big spatio-temporal data analytics, crowdsourcing and reinforcement learning. He published more than 100 papers in prestigious international journals (e.g., ACM TODS, IEEE TKDE, and VLDBJ) and conferences (e.g., SIGMOD, SIGKDD, VLDB, and ICDE). He is an associate editor for IEEE Transactions on Knowledge and Data Engineering (TKDE), etc. He received Alibaba DAMO Academy Young Fellow in 2018, the Excellent Demonstration Award from VLDB 2014, the champion from KDD Cup 2020

Kexin Liu received the MSc degree in Control Science and Engineering from Shandong University, China in 2013, and PhD degree in System Theory from Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China in 2016, respectively. From 2016 to 2018, he was a postdoctoral fellow in Peking University, China. Currently, he is an associate professor with the School of Automation Science and Electrical Engineering, Beihang University, China. His research interests include multi-agent systems and complex networks

References

[1]
Li X, Hu Z, Xu M, Wang Y, Ma J . Transfer learning based intrusion detection scheme for internet of vehicles. Information Sciences, 2021, 547: 119–135
[2]
Razian M, Fathian M, Wu H, Akbari A, Buyya R . SAIoT: scalable anomaly-aware services composition in CloudIoT environments. IEEE Internet of Things Journal, 2021, 8( 5): 3665–3677
[3]
Deng S, Zhao H, Fang W, Yin J, Dustdar S, Zomaya Y A . Edge intelligence: the confluence of edge computing and artificial intelligence. IEEE Internet of Things Journal, 2020, 7( 8): 7457–7469
[4]
Jiang D, Song Y, Tong Y, Wu Y, Zhao W, Xu Q, Yang Q. Federated topic modeling. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2019, 1071–1080
[5]
Mothukuri V, Khare P, Parizi M R, Pouriyeh S, Dehghantanha A, Srivastava G . Federated-learning-based anomaly detection for IoT security attacks. IEEE Internet of Things Journal, 2022, 9( 4): 2545–2554
[6]
Song T, Tong Y, Wei S. Profit allocation for federated learning. In: Proceedings of 2019 IEEE International Conference on Big Data. 2019, 2577–2586
[7]
Wu Q, Chen X, Zhou Z, Zhang J . FedHome: cloud-edge based personalized federated learning for in-Home health monitoring. IEEE Transactions on Mobile Computing, 2022, 21( 8): 2818–2832
[8]
Alkadi O, Moustafa N, Turnbull B, Choo K K R . A deep blockchain framework-enabled collaborative intrusion detection for protecting IOT and cloud networks. IEEE Internet of Things Journal, 2021, 8( 12): 9463–9472
[9]
Tong Y, Pan X, Zeng Y, Shi Y, Xue C, Zhou Z, Zhang X, Chen L, Xu K, Lv W . Hu-Fu: efficient and secure spatial queries over data federation. Proceedings of the VLDB Endowment, 2022, 15( 6): 1159–1172
[10]
Abdel-Basset M, Moustafa N, Hawash H, Razzak I, Sallam K M, Elkomy O M . Federated intrusion detection in blockchain-based smart transportation systems. IEEE Transactions on Intelligent Transportation Systems, 2022, 23( 3): 2523–2537
[11]
Chen H, Asif A S, Park J, Shen C C, Bennis M. Robust blockchained federated Learning with model validation and proof-of-stake inspired consensus. In: Proceedings of AAAI 2021 Workshop - Towards Robust, Secure and Efficient Machine Learning. 2021
[12]
Ashraf J, Bakhshi D A, Moustafa N, Khurshid H, Javed A, Beheshti A . Novel deep learning-enabled LSTM autoencoder architecture for discovering anomalous events from intelligent transportation systems. IEEE Transactions on Intelligent Transportation Systems, 2021, 22( 7): 4507–4518
[13]
Shi Y, Tong Y, Su Z, Jiang D, Zhou Z, Zhang W . Federated topic discovery: a semantic consistent approach. IEEE Intelligent Systems, 2021, 36( 5): 96–103
[14]
Wu Y, Dai N H, Wang H . Convergence of blockchain and edge computing for secure and scalable IIoT critical infrastructures in industry 4. 0. IEEE Internet of Things Journal, 2021, 8( 4): 2300–2317
[15]
Liu H, Zhang S, Zhang P, Zhou X, Shao X, Pu G, Zhang Y . Blockchain and federated learning for collaborative intrusion detection in vehicular edge computing. IEEE Transactions on Vehicular Technology, 2021, 70( 6): 6073–6084
[16]
D’Angelo G, Castiglione A, Palmieri F . A cluster-based multidimensional approach for detecting attacks on connected vehicles. IEEE Internet of Things Journal, 2021, 8( 16): 12518–12527
[17]
Yang J, Chen X, Chen S, Jiang X, Tan X . Conditional variational auto-encoder and extreme value theory aided two-stage learning approach for intelligent fine-grained known/unknown intrusion detection. IEEE Transactions on Information Forensics and Security, 2021, 16: 3538–3553
[18]
Nie L, Wu Y, Wang X, Guo L, Wang G, Gao X, Li S . Intrusion detection for secure social internet of things based on collaborative edge computing: a generative adversarial network-based approach. IEEE Transactions on Computational Social Systems, 2022, 9( 1): 134–145
[19]
Lin T E, Xu H, Zhang H. Discovering new intents via constrained deep adaptive clustering with cluster refinement. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 8360–8367
[20]
Han K, Rebuffi S A, Ehrhardt S, Vedaldi A, Zisserman A. Automatically discovering and learning new visual categories with ranking statistics. In: Proceedings of the 8th International Conference on Learning Representations. 2020
[21]
Castro M, Liskov B. Practical byzantine fault tolerance. In: Proceedings of the 3rd Symposium on Operating Systems Design and Implementation. 1999
[22]
Rezvy S, Luo Y, Petridis M, Lasebae A, Zebin T. An efficient deep learning model for intrusion classification and prediction in 5G and IoT networks. In: Proceedings of the 53rd Conference on Information Sciences and Systems. 2019, 1–6
[23]
Dennis D K, Li T, Smith V. Heterogeneity for the win: one-shot federated clustering. In: Proceedings of the 38th International Conference on Machine Learning. 2021
[24]
Kolias C, Kambourakis G, Stavrou A, Gritzalis S . Intrusion detection in 802. 11 networks: empirical evaluation of threats and a public dataset. IEEE Communications Surveys & Tutorials, 2016, 18( 1): 184–208
[25]
Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015
[26]
Zhang G, Pan F, Dang’ana M, Mao Y, Motepalli S, Zhang S, Jacobsen H A. Reaching consensus in the Byzantine Empire: a comprehensive review of BFT consensus algorithms. 2022, arXiv preprint arXiv: 2204.03181
[27]
Alshammari M, Stavrakakis J, Takatsuka M . Refining a k-nearest neighbor graph for a computationally efficient spectral clustering. Pattern Recognition, 2021, 114: 107869
[28]
Barkhordar E, Shirali-Shahreza H M, Sadeghi R H. Clustering of Bank Customers using LSTM-based encoder-decoder and Dynamic Time Warping. 2021, arXiv preprint arXiv: 2110.11769
[29]
Guo W, Lin K, Ye W. Deep embedded K-means clustering. In: Proceedings of 2021 International Conference on Data Mining Workshops. 2021
[30]
Li H, Ye X, Imakura A, Sakurai T. Ensemble learning for spectral clustering. In: Proceedings of 2020 IEEE International Conference on Data Mining. 2020, 1094–1099
[31]
Xie B, Dong X, Wang C . An improved K-means clustering intrusion detection algorithm for wireless networks based on federated learning. Wireless Communications and Mobile Computing, 2021, 2021: 9322368

Acknowledgements

This work was supported in part by the National Key R&D Program of China (2018AAA0101100), and in part by the National Natural Science Foundation of China (Grant Nos. 62022008 and 92067204).

Competing interests

The authors declare that they have no competing interests or financial conflicts to disclose.

RIGHTS & PERMISSIONS

2024 Higher Education Press
AI Summary AI Mindmap
PDF(13066 KB)

Accesses

Citations

Detail

Sections
Recommended

/