Blockchain based federated learning for intrusion detection for Internet of Things

Nan SUN; Wei WANG; Yongxin TONG; Kexin LIU

doi:10.1007/s11704-023-3026-8

PDF(13066 KB)

Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (5) : 185328. DOI: 10.1007/s11704-023-3026-8

Artificial Intelligence

RESEARCH ARTICLE

Blockchain based federated learning for intrusion detection for Internet of Things

Nan SUN¹ ,
Wei WANG²^,³ ,
Yongxin TONG⁴ ,
Kexin LIU²^,³

Author information +

History +

Abstract

In Internet of Things (IoT), data sharing among different devices can improve manufacture efficiency and reduce workload, and yet make the network systems be more vulnerable to various intrusion attacks. There has been realistic demand to develop an efficient intrusion detection algorithm for connected devices. Most of existing intrusion detection methods are trained in a centralized manner and are incapable to identify new unlabeled attack types. In this paper, a distributed federated intrusion detection method is proposed, utilizing the information contained in the labeled data as the prior knowledge to discover new unlabeled attack types. Besides, the blockchain technique is introduced in the federated learning process for the consensus of the entire framework. Experimental results are provided to show that our approach can identify the malicious entities, while outperforming the existing methods in discovering new intrusion attack types.

Graphical abstract

Keywords

intrusion detection / federated learning / new attacks discovering / blockchain

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Nan SUN, Wei WANG, Yongxin TONG, Kexin LIU. Blockchain based federated learning for intrusion detection for Internet of Things. Front. Comput. Sci., 2024, 18(5): 185328 https://doi.org/10.1007/s11704-023-3026-8

1 Introduction

In recent years, deep learning models such as convolutional neural networks (CNNs), transformers, and recursive neural networks (RNNs) have performed well in dealing with Euclidean structure data, obtaining outstanding predicted results. In practical situations, while deep learning has gained great success on Euclidean data, there is an increasing number of applications where data are generated from non-Euclidean domains and need to be analyzed. Non-Euclidean structure data, such as graph data, is a typical representation. Graph data contains abundant structural information and can acquire the properties of adjacent nodes to update its nodes. And its structure content caters to real world’s information propagation. As a result, to get more accurate prediction in graph data. Graph neural network (GNN) [1-25] comes into existence. GNN is a deep learning model to handle graph data, which makes machine learning applied into graph data structure. Then various GNN models are proposed, graph convolution networks (GCNs) [7-19,26-35], graph recurrent network (GRN) [4], graph attention networks (GATs) [2], graph auto-encoders (GAEs) [1], graph generative networks (GGNs) [3], etc. These GNN models play an important role in these applications, i.e., node classification, graph clustering, link prediction, node clustering, protein network and so on.

The execution of GNN models integrates the features of graph computation [36-46] and neural network, which aggregate the neighbor nodes’ information through the product of adjacent matrix and vertex property, and update the vertex’s property through multilayer perceptron (MLP). As a result, the GNN model exhibits hybrid execution patterns in computation and memory access. The aggregation phase features irregular memory access and computation, while the combination phase is vice versa. Moreover, the dataset of the GNN is composed of real-world graph data, which contains a significant number of nodes and edges. Therefore, as the depth of the model layers increases in the GNN, the amount of computation and irregular memory access cause more significant performance degradation and energy consumption, accounting for execution time and energy expenditure. Nevertheless, the accelerator designed for graph computation and general neural networks cannot be applied to handle this simultaneously. To address the aforementioned problems, there are numerous emerging studies focusing on GNN accelerators, as GCN is the most typical representation of GNNs and all variants of GNN are derived from GCN. The survey on GNN accelerators mainly focuses on those of GCN.

Because the special execution pattern on GNN, conventional architectures, i.e., CPU and GPU [47,48] is not fit for the work, occurring the problem of imbalance of workload, excessive memory access time, low energy efficiency,etc. The reason is explicit, real-world graphs often follow a power-law distribution in a sense that most of vertices are associated with a few edges. The irregularity in aggregation phase inherently falls short in exploiting memory- and instruction-level parallelism on traditional processors. Besides, the dimension of GNN’s node feature vectors can be very high, e.g., each node in Citeseer graph has 3703 features, which can lead to paramount processing costs in the combination phase. Consequently, lots of accelerating GNN works are in demand. The first work of accelerator on GNN application is HyGCN [49], which designs a two-phase architecture to accomplish two computation patterns respectively.

From the first work yielding, plenty of works addressing accelerating GNN are emerging. At present, the trend and opportunities of Domain-Specific Architecture (DSA) and emerging architecture, e.g., process in memory (PIM) is developing. Hardware platform templates, e.g., Field Programmable Gate Array (FPGA) and Application-Specific Integrated Circuit (ASIC), are in line with the demand of times. With regard to the acceleration strategy in the past few years, at the beginning, systolic arrays and ReRAMs are used frequently in general and PIM architecture respectively. Then, pipeline of stages and buffers on chips are also included. Besides, the co-design of algorithm and architecture, i.e., software and hardware, is the main stream of design plan. What’s more, in-storage architecture is also applied. Mostly, the process of adjacent matrix is often considered because it is the cause of abundant memory access and energy consumption. In a sense, most of work are on DRAM, even trying on cache with some methods. But graph data and GNN scale is still increasing, which take scalability of GNN into consideration.

To address above significant aspects and gaps, this paper provides a comprehensive and systematic overview and survey of GNN accelerators:

● We provide a unified framework to categorize the studies on GNN accelerators, which can sort out the architectures and methods.

● We give an introduction of GNN accelerators’ latest work to have a master of research directions.

● We provide a comprehensive overview of unique characteristics of GNN accelerators as well as challenges of it incurred by them.

● Lastly, open issues and prospects for GNN accelerators research are discussed.

The rest of this paper is organized as follows. Section 2 includes a detailed introduction to basic components of GNN and its accelerator, and briefly summarizes recent progress. At the same time, we give a classification and general framework of GNN accelerators. Challenges and characteristics are given in Section 3. So the solutions to the challenges are introduced in Section 4. What’s more, we make a review of previous work and give a direction on the research in Section 5. Finally, this paper concludes in Section 6.

2 Preliminaries

In this section, we give a brief introduction to the preliminaries of graph, GNN and its acceleration architecture, including graph representation and several common GNN models and architecture classification. Then, we summarize some unique characteristics of graph and GNN, followed by the related work of them on general-purpose processors. The characteristics of GNN and the related work further motivate our survey work on GNN accelerators. At last, we make a category framework of GNN accelerators to analyze them further.

2.1 Graph

Graph is a type of data structure consisting of vertices and edges further associated with them. A graph, expressed by

G

, can be generally defined as

G = (V, E)

where

V

represents the vertex set and

E

indicates the edge set. For a directed graph, an edge can be represented as e = (

v_{i}

v_{j}

), indicating that there is an edge pointing from

v_{i}

v_{j}

. To indicate

E

in an intuitive way, we use adjacent matrix, expressed by

A

, representing the neighbor relationships between vertices. We provide definitions of basic graph concepts. For easy retrieval, we summarize the commonly used notations in Appendix A.

In particular, a vertex and an edge can also be attributed with single or multiple properties. Real-world natural graphs, e.g., social networks, usually have the following three general features.

● Power-law distribution. Only a few vertices have associated most of the edges, leading to a severe workload imbalance issue with a large number of conflicts when high-degree vertices are being aggregated.

● Sparsity. The average number of vertex degrees is relatively small. The sparsity of graphs can result in poor locality for data accesses, which causes irregular memory access and computation.

● Small-world structure. Any two vertices in the graph can be connected with only a small number of hops. This feature will make it difficult for partitioning the graph efficiently.

2.2 Graph neural network

Different from CNN, GNN is a type of neural network, which learns from graph data structure and gets accurate predictions in non-Euclidean structure data. This is a general term for algorithms that use neural networks to learn graph structure data, extract and explore features and patterns in graph structure data, and meet the requirements of graph learning tasks such as clustering, classification, prediction, segmentation, and generation.

The previous GNN used RNN to deal with undirected graphs, directed graphs, label graphs and cyclic graphs. After that, it is believed that the first type of GNN is not able to deal with the complex and changeable graph data in reality. As a result, CNN model’s architecture is applied to the graph. Through the ingenious transformation of the convolution operator, it proposes a graph convolutional network (GCN), and derives many variants. GCN realizes the translation invariance, local perception and weight sharing of CNN on the graph, providing ideological guidance and reference for the construction and improvement of other GNN frameworks. Besides, all GNN models are based on GCN variants, so most of the accelerators work on GNNs is also based on GCN design, and the subsequent introduction and analysis in this survey are based on GCN.

GCN has two main methods for convolution operation: one is based on spectral decomposition, that is, spectral decomposition graph convolution, the other is based on node space transformation, that is, spatial graph convolution. Because the spectral GNN transforms all nodes in the graph at the same time, which requires high computing and storage space and is difficult to adapt to the analysis of large-scale graph data sets, the subsequent introduction and analysis in this survey are based on spatial GCN, which is main optimization model in accelerators work. The variant model is in Appendix B.

The existing mainstream graph neural network algorithm can be abstracted as a model:

(1)

h_{v}^{'} = A g g r e g a t e (h_{u \in N (v)}^{(k - 1)}),

(2)

h_{v}^{(k)} = C o m b i n e (h_{v}^{(k - 1)}, h_{v}^{'}),

where

h_{v}^{(k)}

denotes the feature vector obtained by updating node

v

in the kth layer. Overall, graph neural network is a hybrid of traditional graph computation and neural network. In the execution of each layer, the graph neural network first traverses the entire graph (or sampled subgraphs), and then aggregates the neighborhood information for each node. The node’s intermediate features

h_{v}^{'}

in this layer are obtained by aggregating the information of each node. The process is similar to the traditional graph computation, as in Eq. (1). The node’s intermediate features are combined with the node’s features obtained in the previous layer. The node’s output features in this layer are obtained by updating vector

h_{v}^{(k)}

, the process is similar to that of traditional neural networks, as in Eq. (2). And the entire process is in Fig.1.

Received	Accepted	Published
11 Jan 2023	01 Jun 2023	15 Oct 2024
Just Accepted Date	Issue Date
05 Jun 2023	10 Jul 2023

Name	Model supported	Stage optimized	Platform	Hybrid or uniform
HyGCN	GCNs	Inference	ASIC	Hybrid
Auten et al. [55]	GCNs, GATs	Inference	ASIC	Hybrid
GraphACT	GCNs	Training	CPU-FPGA	Hybrid
DeepBurning-GL [56]	GCNs	Inference	FPGA	Hybrid
AWB-GCN	GCNs	Inference	ASIC	Uniform
GCNAX [28]	GCNs	Inference	ASIC	Uniform
Cambricon-G [57]	GCN,GraphSAGE	Both	ASIC	Uniform
BlockGNN [58]	GCNs,GAT	Inference	FPGA	Uniform
ReGraphX [59]	GNNs	Training	PIM	Uniform
I-GCN [30]	GCNs	Inference	ASIC	Hybrid
GRIP [60]	GCN, GraphSAGE, GIN	Inference	ASIC	Hybrid
EnGN [54]	GCNs, GRN	Inference	ASIC	Uniform
Huang et al. [27]	GCNs	Inference	PIM	Uniform
GCOD [29]	GCNs	Inference	ASIC	Uniform
ReGNN [61]	GNNs	Inference	ASIC	Hybrid
ReGNN [62]	GCNs	Inference	PIM	Hybrid
Graphite [31]	GNNs	Both	CPU	Uniform
SmartSAGE [63]	GraphSAGE	Training	CPU-FPGA	Uniform
CoGNN [64]	GNNs	Training	GPU	Uniform
PASGCN [34]	GCNs	Inference	PIM	Uniform
FlowGNN [65]	GNNs	Inference	ASIC	Hybrid
SGCN [33]	GCNs	Inference	ASIC	Hybrid
GROW [32]	GCNs	Inference	ASIC	Uniform
GraNDe [66]	GCNs	Inference	ASIC	Hybrid
GNNAdvisor [25]	GNNs	Both	GPU	Uniform
GNNLab [20]	GNNs	Training	GPU	Uniform
Degree-Quant [22]	GNNs	Both	CPU-GPU	Uniform
Flexgraph [21]	GNNs	Training	CPU	Uniform
SGQuant [23]	GNNs	Both	Memory-constrained Devices	Uniform
QGTC [24]	GNNs	Both	GPU	Uniform
Xu et al. [35]	GCNs	Training	GPU	Uniform

	Aggregation	Combination
Access pattern	Indirect and Irregular	Direct and regular
Data reusability	Low	High
Computation Pattern	Dynamic and Irregular	Static and regular
Computation Intensity	Low	High
Execution Bound	Memory	Compute

Name	Nodes	Edges	Features	Storage	Classes
Pummbed(PB)	19717	88648	500	38MB	3
Cora(CR)	2708	10556	1433	15MB	7
Citseer(CS)	3327	9104	3703	47MB	6
Reddit(RD)	232965	11465892	602	1.8GB	41
NELL(NE)	65755	266144	5414	1.3GB	210

About the journal

Browse

Authors & reviewers

Abstract

Graphical abstract

Keywords

Cite this article

1 Introduction

2 Preliminaries

2.1 Graph

2.2 Graph neural network

Fig.1 The entire process of GNN [50]

2.3 Accelerator architecture

Tab.1 Current GNN acceleration architectures

2.4 Categories and frameworks

Fig.2 The design aspects of GNN accelerators

3 The characteristics and challenges

3.1 GNN architecture characteristics

Tab.2 Execution behaviors in GNNs [49]

Tab.3 Current common GNN dataset

Fig.3 The non-zero elements of GNN adjacent matrix (a) and weight matrix (b) [26]

Fig.4 An example of inference on vertex B using a two-layer GCN. The nodeflow describes the propagation of features within a message-passing layer (MPL) [60]. (a) Input graph; (b) nodeflow; (c) inference dataflow

3.2 GNN accelerators design challenges

Fig.5 Execution time breakdown of the two phases [49]

4 Detailed works

4.1 Optimizations on memory

Fig.6 Breakdown of the pipeline slots spent on retiring micro-ops or stalled by different bottlenecks during a full-batch training of GraphSAGE on a CPU [31]

Fig.7 Overview of GNNAdvisor [25]

Fig.8 A breakdown of memory usage and data similarity for different stages of the SET model when training OGB-Papers over multiple GPUs (G0, G1,...) with 16 GB of memory each [20]

Fig.9 Illustrative examples of EdgeUpdate redundancy (ER) and Aggregation redundancy (AR) [61]

Fig.10 Overview of ReGNN [61]

Fig.11 Overview of I-GCN [30]

Fig.12 Overview of GCOD [29]

Fig.13 Overview of GROW [32]

Fig.14 Overview of SGCN [33]

4.2 Optimizations on computation

Fig.15 Per-minibatch scheduling between CPU and FPGA (up), and between FPGA computation modules (down) [52]

Fig.16 The number of operations for the five datasets (first layer) using the two execution orders [28]

Fig.17 FlexGraph architecture [21]

Fig.18 Execution paths of backward aggregations in two layers on the example graph [35]

Fig.19 Overview of GRIP [60]

Fig.20 The main framework of QGTC [24]

Fig.22 Multi-Granularity Quantization: (a) Component-wise, (b) Topology-aware, (c) Layer-wise, and (d) Uniform Quantization. NOTE: the same color represents the same quantization bit [23]

Fig.23 Overview of FlowGNN [65]

4.3 Optimizations on emerging technologies

Fig.24 The ReRAM architecture [28]

Fig.25 The REFLIP architecture [27]

Fig.27 Overview of ReGNN [62]

Fig.28 GCN design with the two design patterns. (a) Adjacent matrix; (b) GCN design with general MM (GEMM) crossbars; (c) GCN design with CAM crossbars and MAC crossbars [34]

Fig.29 Overview of PASGCN [34]

Fig.30 Overview of GraNDe [66]

5 Future directions and reviews

6 Conclusions

{{custom_sec.title}}

{{custom_sec.title}}

References

Acknowledgements

Competing interests

RIGHTS & PERMISSIONS