Introduction
Urban traffic congestion has become a global issue and caused severe effects, such as increased travel time, fuel consumption, and air pollution
[ 1− 3] It has serious implications for the global economy and the environment. For example, in recent years, the USA alone lost $87 billion per year in extra driving time and gasoline due to traffic jams
[ 4, 5] .
Inadequate traffic infrastructure, the rapid growth of vehicles, and pre-determined traffic signal control (TSC) methods are the leading causes of urban traffic congestion
[ 6] . Addressing urban planning and infrastructure concerns necessitates significant financial and material resources. As a result, improving the existing TSC methods is the most cost-efficient way to relieve traffic congestion.
In recent years, significant progress has been made on reinforcement learning (RL) for solving various sequential decision-making problems in Artificial Intelligence (AI) games, such as Atari
[ 7] , Go
[ 8] , and Dota2
[ 9, 10] . The TSC problem can be regarded as an agent that can make decisions at intersections by interacting with the environment, just like in a game.
It is challenging to alleviate traffic congestion by optimizing and controlling traffic signals only at a single intersection
[ 11] . TSC is being extended from optimization of a single intersection to multiple intersections, which can be formulated as a multi-agent system with cooperation between agents. Hence, multi-agent reinforcement learning (MARL) has been receiving more attention from researchers in these years
[ 12− 14] . Furthermore, considering the network transmission and communication between traffic lights, efforts toward efficiently deploying MARL in practice has remained a research challenge.
Additionally, most research focused on lab theories and algorithms with few considerations of industrial-scale deployment issues. With the limited capabilities of the network transmission bandwidth and underlying computing resources, optimizing the deployment structure and algorithmic performances for TSC is essential for intelligence transport systems (ITS).
ITS-related technologies, such as urban traffic simulation environments
[ 15− 18] , TSC algorithms
[ 19, 20] , and traffic signal communication
[ 14, 21] , have considerably enhanced traffic operation and management. To the best of our knowledge, there is rare research considering all aspects above for TSC, and few works can be deployed in the real-world. We summarize several existing challenges in TSC:
1) Although some studies have contributed to multi-intersection TSC
[ 22− 24] , it lacks an integrated architecture to leverage the traffic simulation environment, cooperative TSC algorithm, and traffic signal communication to achieve optimal multi-intersection TSC;
2) Traditional algorithms in urban TSC rarely consider traffic light cooperation and communication simultaneously;
3) Most studies optimize the algorithms but ignore the network capacity or latency in the urban TSC process.
To address the aforementioned challenges, an integrated and cooperative architecture for TSC across multiple intersections is proposed in this paper. The main contribution is threefold:
1) An integrated architecture, namely General City Traffic Computing System (GCTCS), is proposed, which integrates an urban traffic simulation environment, TSC algorithm, and communication across the traffic signal network simultaneously.
2) A MARL algorithm, namely General-MARL, is developed for TSC based on GCTCS, considering cooperation and communication between traffic lights.
3) Comprehensive experiments have been conducted to validate the proposed architecture and algorithm with promising results. From experimental results, our novel architecture is much closer to the real-life traffic environment. With the proposed algorithm, the average speed of vehicles is increased by 23.2%, and the network latency is reduced by 11.7% compared with baseline algorithms.
The remainder of this paper is organized as follows: Section Related Work introduces related works. Section Preliminary explains the basic concepts and problem definition. Section Methodology describes the architecture of GCTCS and details the General-MARL method for cooperative traffic light control based on GCTCS. Section Experiments conducts experiments to demonstrate the advantage of the General-MARL algorithm. Section Conclusions concludes the paper and discusses future work.
Related Work
In this section, we discuss and introduce studies on the TSC, which can be divided into two typical categories: Conventional approaches and RL-based methods.
Conventional methods for TSC
Conventional TSC methods are classified into four types: fixed-time control
[ 25] , actuated control
[ 26] , adaptive control
[ 27] , and optimization-based control
[ 28] . Fixed-time is a conventional and primary method of urban TSC, benefiting from the simplicity of deployment. It usually consists of a pre-timed cycle length, fixed cycle-based phase sequence, and phase split. While calculating the cycle length and phase split, the traffic flow is assumed to be uniform during a specific period. Since introduction in the 1950s
[ 25] , it has been a leading solution to TSC in practice considering that the urban traffic environment is complex and uncertain, and mathematical approaches cannot precisely build a model from internal operational mechanisms of a TSC system. Actuated control
[ 26] decided whether to keep or change the current phase based on the pre-defined rules and real-time traffic data. It could set the green signal for a specific traffic signal phase if the number of approaching vehicles is larger than a threshold. Based on traffic volume data from loop sensors, adaptive control
[ 27] created a set of traffic plans and chose the one that was best for the current traffic situation. Optimization-based control
[ 28] formulated TSC as an optimization problem under a dynamic traffic flow and decided the traffic signal according to the observed traffic information. All of the methods discussed above heavily rely on human-designed traffic signal plans or rules.
RL based methods for TSC
RL-based methods have emerged as a promising TSC solution, which are designed for different application scenarios including single intersection control
[ 11, 29] , and multi-intersection control
[ 30, 31] .
Single intersection control
Abdulhai et al.
[ 11] introduced Q-learning for TSC and presented a case study involving application to traffic signal control. Li & Wang
[ 29] proposed the idea to set up a deep neural network (DNN) to learn the Q-function of reinforcement learning from the sampled traffic state/control inputs and the corresponding traffic system performance result. Park et al.
[ 32] developed two traffic signal control models using reinforcement learning and a microscopic simulation-based evaluation for an isolated intersection. Additionally, the models could also be adapted for two coordinated intersections.
Multi intersection control
Multi-agent reinforcement learning (MARL) involves the participation of more than one agent
[ 12] . It can learn through the cooperation of (1) sharing instantaneous information through interaction with the environment and (2) sharing learned policies in episodic experience.
MARL is a suitable method for the TSC problem, which can be solved as a typical MARL system for optimization of all intersections
[ 13, 14] . There exist intelligent traffic agents in the environment that can facilitate learning progress in MARL. Co-DQL model
[ 12] used a highly scalable independent double Q-learning method based on double estimators and the upper confidence bound (UCB) policy for multi intersections. Wang et al.
[ 13] proposed two distributed MARL control models as well as a Federated Learning (FL) framework to solve the ATSC problem, where the former is based on Advantage Actor-Critic (A2C) algorithm, and the latter is based on Federated Averaging (FedAvg) algorithm. El-Tantawy et al.
[ 30] investigated the following dimensions of the control problem: (1) RL learning methods, (2) traffic state representations, (3) action selection methods, (4) traffic signal phasing schemes, (5) reward definitions, and (6) variability of flow arrivals to the intersection. Rasheed et al.
[ 31] designed a multi-agent DQN (MADQN) and investigated its use to further address the curse of dimensionality under traffic network scenarios with high traffic volume and disturbances. El-Tantawy et al.
[ 30] introduced a multi-agent auto communication (MAAC) algorithm, which is an innovative adaptive global traffic light control method based on multi-agent reinforcement learning (MARL) and an auto communication protocol in edge computing architecture. The MAAC model considered traffic communication but did not leverage MARL and traffic simulation environment optimization.
From the literature, we find that most studies attempt to develop an RL or MARL model to address the TSC problem directly, ignoring the traffic simulation environment optimization and traffic communication simultaneously.
Preliminary
Traffic simulation environment
In order to create models for simulating complex traffic dynamics, urban traffic simulation systems typically integrate computing technologies and operational features of traffic flow (as illustrated in Fig. 1). Road network, vehicle description, traffic signal control, and communication between the simulation system and traffic equipment are typically the basic elements of an urban traffic simulation system.
1 An urban traffic simulation environment for traffic research. |
Full size|PPT slide
The traffic simulation system can be divided into two groups: the commercial and open-source system. VISSIM
[ 33] and Paramics
[ 34] are widely used as commercial traffic simulation systems. While, open-source systems, such as MIT Simlab
[ 15] , SUMO
[ 16] , CityFLow
[ 17] are usually adopted by researchers.
Traffic computing structure
Communication is one of the key functions in cooperation, which usually suffers from inevitable network latency. However, relevant studies on the development and deployment of computing frameworks often ignore the network delay in communication. In general, there are three major IoT computing structures, namely cloud computing
[ 35] , fog computing
[ 21] , and edge computing
[ 36] .
Cloud computing is a ubiquitous, convenient, and on-demand repository of computing resources such as networks, servers, and storage. Fog computing is closer to where data is generated than cloud computing. Edge computing and data processing are performed at the nearest location where terminal devices generate the data.
Problem definition
The TSC problem can be regarded as an agent that can make decisions at intersections by interacting with the environment.
It can be formulated as an Markov Decision Process (MDP)
[ 37] (
), with the traffic state set
S, action set
A, transition probability
P, reward
R, control policy
and discount factor
, as shown in
Fig. 2.
2 An illustration of traffic signal control at an intersection. According to the current traffic state and the reward , the agent selects and executes the corresponding action (change or maintain the current traffic light). Then the agent evaluates the effects of the action to obtain a new traffic state and a new reward . |
Full size|PPT slide
In this paper, we formalize the TSC problem as decentralized intersection optimization by MARL for multi-intersection (as shown in Fig. 3). Each intersection is controlled by an algorithm . At time step , makes an optimal decision to choose proper signal phase after analyzing its observation of the traffic state (vehicle queue, number of vehicle and etc.) at the intersection. In this case, the time T (average travel time) that vehicles spend in the traffic network can be minimized, denoted as:
3 The MARL structure in urban traffic signal control. |
Full size|PPT slide
where is the average time vehicles spending at intersection based on algorithm on state .
In a multi-agent system, the cooperation between agents is usually based on one computing structure. Thus, an effective and efficient integration solution covering different aspects is needed for TSC in practice.
Methodology
Architecture
In this section, we introduce an integrated and cooperative architecture, i.e., GCTCS, for urban TSC across multiple intersections. It integrates an urban simulation environment, a hybrid computing framework (cloud computing, fog computing, and edge computing), an interface for TSC algorithm deployment, a dynamic prediction module of traffic flow for traffic configuration interface, and an edge computing node of real-time traffic video processing
[ 38] .
The development of GCTCS is to support more synthetic and efficient TSC experiments involving multiple intersections and to provide simulation data that are closer to real traffic conditions for industrial deployment, as shown in Fig. 4.
4 The General City Traffic Computing System (GCTCS) for urban traffic signal control. |
Full size|PPT slide
GCTCS dynamically connects the simulation environment with the natural environment of multi-intersection traffic signals; provides urban TSC algorithm interfaces deployed under the hybrid computing architecture of cloud computing, fog computing, and edge computing; and takes complete account of network bandwidth and network delay. GCTCS can break through the barrier between the real traffic flow across multiple intersections and the traffic flow configuration of the simulation environment and realize dynamic synchronization between the real environment and the simulation environment. In addition, the overall computing architecture of GCTCS is based on a hybrid computing framework, which fully considers and simulates network delay. It provides interfaces for urban TSC and traffic flow prediction algorithms.
The functions of each layer
The cloud computing layer is located at the top layer of the architecture. It is responsible for interacting with each fog computing node in the fog layer, collecting information at the fog node, and sending control instructions from the cloud computing layer.
The fog computing layer is located between cloud computing and edge computing layers and consists of various powered servers, routers, and controllers that can bear heavy processing
[ 39] . The primary function is to compute and output the control signal instructions.
The edge computing nodes are located at the bottom layer, mainly processing surveillance videos and extracting status information at the current traffic intersection.
The module of traffic simulation environment
We designed the module of urban traffic simulation environment based on the urban traffic simulation platform described above, and it flexibly establishes the traffic flow forecasting interface for multiple intersections
[ 40] . The module can be configured by dynamic traffic flow based on the current simulation environment, making the environment more realistic.
This module employs the traffic flow forecasting method's external interface to dynamically construct traffic flow at each intersection, allowing the traffic simulation system to assimilate the current condition and provide an algorithm interface for TSC. The simulation environment FLOW framework
[ 18] is to optimize the urban traffic simulation environment benefiting from the forecasting capabilities based on data sets and time series models.
The traffic environments in different cities differ, and it is necessary to establish a unified method to obtain traffic circulating situations to construct the urban traffic environment closer to the real world (as shown in Fig. 5). The methodology (1) uses edge computing capabilities to convert the traffic flow captured in the urban traffic monitoring videos into text content; (2) applies a traffic flow prediction algorithm to predict relatively accurate traffic flow; and (3) dynamically interacts with the simulation environment's traffic flow configuration interface. The urban traffic environment is not built in isolation. It interacts with the simulation environment. The process is as follows:
5 The processes of construction of the urban traffic real environment. |
Full size|PPT slide
(1) Establish a data set of urban traffic flow;
(2) Design a forecasting model for traffic flow;
(3) Develop an interactive interface with the simulation environment.
The simulation environment interacts with the traffic flow anticipated by the traffic flow prediction model, allowing the simulation environment to configure real-time traffic flow.
General-MARL
In this section, we design a TSC method, named General-MARL, based on the GCTCS.
The design of the General-MARL algorithm is according to different layers of the GCTCS architecture. Thus, the General-MARL is composed of three sub-algorithms: the algorithm at the edge computing layer (Edge-General-control), the algorithm at the fog computing layer (Fog-General-control), and the algorithm at the cloud computing layer (Cloud-General-control). Additionally, the GCTCS with a layer-to-layer connection, which has a clear structure and lowers complexity compared to point-to-point connections, making feasible for the General-MARL with the increasing computation ability for edge and fog devices.
The Cloud-General-control produces the global control signal for each fog node's parameter updating based on the joint state (all traffic states from each node) and playback mechanism (historical interactions from fog nodes). The Fog-General-control generates a TSC signal according to the traffic state abstraction from Edge-General-control, the intelligent communication between agents allows an agent to share the learned strategies and the parameters from the cloud layer. Deep neural networks
[ 41] are the basic component in General-MARL for generating control actions, communication information, and so on. The structure of the algorithm architecture is illustrated in
Fig. 6.
6 The General-MARL is composed of three sub-algorithms based on different layers of the GCTCS architecture. |
Full size|PPT slide
Edge-General-control
The Edge-General-control algorithm processes the traffic surveillance video based on the target detection algorithm YOLO
[ 38] , detecting the vehicles' location at the intersection.
As shown in
Fig. 7, we apply an open-source vehicle image dataset, i.e., BIT-Vehicle
[ 42] , to fine-tune the YOLO algorithm for vehicle detection precisely. The center of each intersection is the center point (0,0). We get the vehicle position from the YOLO-based model and add two fully-connection (FC) layers for fine-tuning. Then, we obtain the direction information according to the vehicle's location and center position. Hence, traffic state text is abstracted from the traffic surveillance video, including the vehicle-id, type, direction, and timestamp for passing vehicles.
7 The process of abstracting traffic information from the video to generate the text info from traffic flow. Detection of vehicles in the three regions stands for different directions of passing vehicles |
Full size|PPT slide
Finally, we introduce the traffic flow prediction algorithm GCN-GAN
[ 40] , which can predict the traffic flow and synchronizes with the simulation environment in real-time (
Algorithm 1).
1 Edge-General-control algorithm. |
1: Input the video information of the traffic situation. |
2: Capture one frame from the video: G |
3: Use and fine-tune the YOLO to recognize all the vehicle's position (x,y) and type in G. |
4: for vehicle in G do |
5: Obtain the direction of vehicles by judging the (x,y) from the regions according the three regions predefined. |
6: Record traffic state text information (vehicle -id, types, direction, and the timestamp) into traffic state text T. |
7: end for |
8: Use GCN-GAN for traffic flow prediction to T. |
9: Connect traffic flow prediction capability to the urban simulation environment. |
Fog-General-control
The Fog-General-control algorithm includes the Nash-MARL module and the communication module to generate control strategies for urban traffic signals.
Nash-MARL Module: The Nash-MARL Module is to obtain Nash equilibrium for TSC without prior knowledge dynamically
[ 43] . It defines that
is the traffic environment state space for multi-intersection;
is the joint action from agents to tune transfer signal;
is the action space of one agent
; joint action is
.
is the reward or objective function. The object of
is to choose a proper strategy
and to maximize the objective function
.
Herein, the Bellman equation with Nash Equilibrium: firstly, fix other agents' policies ; optimize the objective function of :
Where is ' policy according to other agents' action selection policies. and . The result of the objective (reward) function is based on the policy selection for participated each agent. It automatically generates its action policy by considering the behavior of other agents. When the behavior of other agents is stable, tries to optimize the behavior of the objective function. All agents' policies could achieve Nash equilibrium after iteration update.
In the design of the Nash-MARL Module, the computation of the critic function (advantage) in deep reinforcement learning might have positive (good) and negative (bad) reward results, making the learning process efficiently. Thus, the Nash-MARL module defines the function of an as ; the estimate value function of is , wherein is the union action, and is the union state.
It employs the actor-critic model
[ 44] as a framework. In the actor-critic model, the actor is a policy function and the critic is a value function. The parameter set
into the value function parameter set and the policy function parameter set
, where
represents the model of the value function
;
represents the parameters of the agent (participant) action selection policy
.
is the next state of
. The model samples M epochs. The object of the algorithm is to minimize the loss of the sampled data and the Nash-Bellman equation
:
The module defines for simplifying the above expression.
Inspired by the deep Q-networks
[ 7] , the module also introduce a memory buffer (replay buffer) to store triples
, which represents the previous state of the environment
; the union operation performed
; the next state of the environment
; and the reward
that passes through this state. The module can use stochastic gradient descent (SGD) to update the parameters after randomly selecting a piece of memory data from the replay buffer. To improve the action plan, the algorithm also employs ϵ-greedy exploration.
Additionally, the idea of parallel space-time is incorporated to enable the simultaneous execution in various environments. To achieve a stable learning process, multiple explorations in multiple environments would explore different policies and accelerate the learning speed. The overall algorithm structure is shown in Fig. 6. Further, the Nash-MARL algorithm process is detailed in Algorithm 2.
1: | Init Episode , ; Minibatch Size , Game Step N |
2: | Init Replay Buffer , and |
3: | repeat |
4: | Reset, go to the |
5: | repeat |
6: | Select or select randomly (e.g., ϵ-greedy) |
7: | Observe |
8: | Store to Reply Buffer |
9: | Sampling from Replay Buffer: |
10: | Optimize fix update |
11: | Optimize fix update |
12: | Until |
13: | Until |
14: | return and |
Communication Module: The communication module between agents allows an agent to share the learned strategies with others based on the attention mechanism
[ 45] .
The calculation steps in the communication module
[ 14] at time
are shown in
Fig. 8. At a time
in the communication module, the environment input is
and the corresponding communication information input is
. Multi-agents
are able to interact with each other. Each agent receives information from receivers and transmitters internally. The receiver receives its own environmental information
and communication information
, and generates action and external interaction information group
at
.
8 The communication process between agents in the communication module. |
Full size|PPT slide
In the communication module (as shown in Algorithm 3), the parameter set of for each agent is . Furthermore, is divided into the sender and receiver . The parameters of the sending side and the receiving end are optimized by the overall multi-agent objective function. The parameter sets of the receiver and the sender are updated iteratively for each agent.
3 The communication module. |
1: Initialize the communication matrix of all agents |
2: Initialize the parameters of agent and |
3: repeat |
4: Receiver of : use attention mechanism to generate communication matric |
5: Sender of : chooses an action from policy selection network, or randomly chooses an action (e.g., ϵ-greedy exploration) |
6: Sender of : generate its own information through the receiver's communication matrix |
7: Collect all the joint actions of Agent and execute the actions , get the reward from the environment and next state |
8: until End of Round Episode |
9: return and for each agent |
The Fog-General-control algorithm is illustrated as below ( Algorithm 4):
1: Apply the communication module: |
2: Initialize the communication matric of all fog computing node Agents |
3: Initialize the parameters and of the fog computing node Agents |
4: Receive the global parameter sets and distributed by the cloud computing node and initialize the parameter sets and |
5: Initialize the Episode , ; the minimum batch size Minibatch Size , the number of game steps |
6: Apply the Nash-MARL Module: |
7: Initialize the memory record Replay Buffer |
8: repeat |
9: | Reset the environment and enter the initial state |
10: | repeat |
11: | Choose joint action or randomly choose joint action (e.g., ϵ-greedy exploration) |
12: | Observe the state-action-state triplet |
13: | Store triples in the Replay Buffer |
14: | Extract data from the Replay Buffer |
15: | receiver uses Attention mechanism to generate communication matrix |
16: | The strategy choice network of the Agent sender chooses an action , or randomly chooses action a (e.g., ϵ-greedy exploration) |
17: | The sender generates its own information through the communication matrix at the receiving end |
18: | Collect the joint actions of all Agents, execute an action , get rewards and the next state from the environment |
19: | Optimization step , fixes updates |
20: | Optimization step , fixes updates |
21: | until |
22: until |
23: Return and |
Cloud-General-control
The Cloud-General-control algorithm deploys the Nash-MARL module, the 'parallel universe' of the Nash-MARL algorithm are fog nodes.
The Cloud-General-control Algorithm 5 is:
1: Apply the Nash-MARL module: |
2: Initialize the global parameter sets and of the cloud computing center and the global counter |
3: repeat |
4: Distributer global parameters to fog computing nodes , |
5: repeat |
6: Update global parameters , |
7: until all fog computing nodes are traversed and collected |
8: |
9: until |
Experiments
In this section, we apply General-MARL and baseline methods to the integrated and cooperative architecture GCTCS for multi-intersection TSC. The experimental process ignores or considers the situation of network bandwidth and network delay, respectively.
Dataset
The dataset from Lanzhou in China consists of two parts: (1) the traffic network; and (2) the traffic flow. The traffic network describes the traffic network, including lanes, roads, intersections, and signal phases. As shown in Fig. 9, there are 27 intersections connecting 45 roads. All the traffic networks are simulated in the simulation environment from our GCTCS. Various and flexible TSC algorithms control each intersection's signal. The distance between two intersections is two to four kilometers (km). The speed limitation is 60 (km/h).
9 The illustration of topology map of real traffic intersection. |
Full size|PPT slide
The initial flow of incoming and outgoing vehicles at each intersection is configured based on the real traffic flow data. The traffic flow dataset contains vehicles travel information, which is described as , where is the time that each vehicle starts entering the traffic network and is the pre-planned route from its original location to its destination.
Parameter settings of General-MARL
Cloud computing center
We deploy the Cloud-General-control algorithm in a Docker environment
[ 46] , logically away from the intersections, and set the communication delay from the cloud center to the fog computing node to
seconds in the simulation code (set
= 0 second to sleep to ignore network delay; set
= 10 seconds to consider network delay). Then we set four hidden layers in the network for policy network generation
in the cloud computing node with 40, 80, 80 and 80 nodes in each layer, respectively. The network layers are activated and connected through the activation function ReLU
[ 47] ; the main neural network has three hidden layers, each containing 40 nodes. The layers are activated and connected through ReLU. The learning rate is set to 0.001.
Fog computing node
We deploy the Fog-General-control algorithm in the Docker environment and set it to be logically close to the intersection. The delay from the intersection to the fog computing node is set to second (set = 0 second to sleep to ignore network delay; set = 1 second to consider network delay). Twenty seven fog computing nodes are deployed in the environment. The network for policy network generation in each node is set with four hidden layers, with 40, 80, 80, and 40 nodes, respectively. The layers are connected through ReLU activation, and the main neural network has three hidden layers, each containing 40 nodes. The layers are connected through ReLU activation. The learning rate is set to 0.001.
Edge computing node
We deploy the Edge-General-Control algorithm in the Docker environment at one intersection. The network delay from the edge computing nodes to fog computing and vehicles are both set to seconds (set = 0 second to sleep to ignore network delay; set = 1 second to consider network delay).
Initial traffic light period setting
The traffic control period is set to 60 s initially. The intervals of the green, red and yellow lights are set as = 27 s, = 27 s, and = 27 s, respectively.
Vehicle simulation
According to the actual traffic flow forecasting algorithm, the traffic flow is predicted every 15 min, and the vehicle simulation is conducted at each intersection.
Evaluation mechanism
In the simulation environment, the passing time of all vehicles at each intersection in an Episode is recorded as , wherein is the number of Episodes, is the number of vehicles, and is the number of intersections. In this configuration, we set = 1,000; = 27.
Methods for comparison
Fixed-Time [ 25] : a policy gives a fixed cycle length with predefined green time among all phases. The intervals of the green, red and yellow signals are fixed as green and red are 27 s, and the yellow is 6 s, which are the same as the initial intervals for other methods for a fair comparison.
Q-learning [ 32] : a model-free reinforcement learning algorithm to learn the value of an action in a particular traffic state based on Q-learning
[ 48] , which leverages the advantage deep neural network for addressing TSC problem. The algorithm can be deployed on the Fog or Edge node with the same parameters as shown in
Table 1.
1 List of parameters in this paper. |
Module | Parameters | Description |
Cloud Computing Center | 20, 60, 60, 20 0.001 | The delay from the cloud to the fog node The hidden layers in the network The learning rate |
Fog Computing Node | 20, 60, 60, 20 0.001 | The delay from intersection to the fog node The hidden layers in the network The learning rate |
Edge Computing Node | | The delay from edge nodes to fog node |
Experiment Settings | 15 E = 1,000 I = 27 l = 0.001 γ = 0.982 | The initial intervals of the green, red and yellow The traffic flow prediction period The number of Episodes The number of intersections The learning rate The discount rate |
Nash-Q [ 49] : a model integrates Q-learning
[ 48] and Nash Equilibrium
[ 50] for making agents learn a better cooperative strategy, which can be deployed on the Fog node.
Nash-DQN [ 51] : a Deep-Q-learning methodology for model-free learning of Nash Equilibrium for general-sum stochastic games.
MAAC [ 14] : a multi-agent auto communication (MAAC) algorithm is an adaptive global TSC method based on MARL. MAAC combines multi-agent auto communication protocol with MARL, allowing an agent to communicate the learned strategies with others for achieving global optimization in traffic signal control. The intervals of the green, red and yellow signals are fixed as the experiment settings.
Experimental process
Each experiment consists of two parts: the first part ignores the network delay and compares results with other TSC algorithms; in the other part, network delay is considered.
Ignoring network delay
The delays of edge computing, fog computing, and cloud computing nodes in General-MARL are all set to 0 without considering network latency. In the urban simulation environment, we input actual traffic data before configuring various TSC algorithm models via the algorithm interface. Baseline methods and the General-MARL algorithm are run in the simulation environment. The traffic flow at each intersection is updated every 15 min. We record the waiting time of vehicles at each traffic intersection every minute (considered as an environmental feedback reward) and conduct 1000 Episode (round) training.
As shown in Fig. 10, the training convergence effect of the General-MARL algorithm is as good as the Nash-DQN algorithm but does not perform the best.
10 Multi-intersection traffic signal control training process (without network delay). |
Full size|PPT slide
Considering network delay
In the case of considering network delay, the delays of the edge computing, fog computing, and cloud computing nodes in General-MARL are all set parameters according to the previous section. Then we configure the fixed duration as the traffic signal time series as the previous setting. Q-learning (edge) deploys the Q-learning algorithm on the fog computing node to control the traffic lights, and the delay from the controlling agent to the intersection is 1 s. There are 27 docker containers deployed at 27 traffic intersections with Q-learning capabilities to generate traffic flow controlling signals, respectively.
Q-learning (center) and DQN only use the cloud computing center to collect the information at each intersection. They are applied for overall control and generation of control commands for each intersection. The traffic flow prediction algorithm then updates the traffic flow at each intersection every 15 min (GCN-GAN). GCTCS records the waiting time of vehicles at each intersection every minute (as an environmental feedback reward) and conducts the training with 1000 Episodes.
As shown in Fig. 11, the convergence speed of the General-MARL is faster than Nash-DQN, and the training results outperform other baseline algorithms.
11 Multi-intersection traffic signal control training process (with network delay). |
Full size|PPT slide
Experimental results
The results of the experiment are also split into two parts: the first part ignores network delay, and the second part considers network delay.
Ignoring network delay
After training in the simulation experiment in the previous section, 500 episode tests were performed to generate the experimental results, as shown in Table 2.
2 Results of General-MARL and other algorithms (ignoring network delay). |
Method | Average speed (km/h) | Average waiting time (s) |
Fixed-time | 10.17 | 166.70 |
Q learning | 18.43 | 135.62 |
DQN | 20.10 | 112.24 |
A3C | 24.12 | 90.73 |
Nash-Q | 29.70 | 70.14 |
Nash-DQN | 33.81 | 61.21 |
MAAC | 27.39 | 80.21 |
General-MARL | 31.22 | 62.87 |
The experimental results show that, regarding the average speed and average waiting time of vehicles at 27 intersections, the General-MARL algorithm leads a similar performance as the Nash-DQN algorithm. The performance of General-MARL is not outstanding when network delay is ignored, but the overall performance of General-MARL is better than other baseline algorithms.
Considering network delay
We conduct 500 episodes in the tests after training in the simulation and collected the experimental results, as shown in Table 3. Regarding the average waiting time at intersections, the overall performance of General-MARL is the best, surpassing all the other algorithms, and it shows an average decrease of 18.3% waiting time compared to the baseline algorithms. As for the average speed of vehicles at intersections, the General-MARL algorithm increases the average speed by 10.2% when network delay is considered. As shown in Table 4, the General-MARL algorithm fully exceeds the baseline algorithms and other algorithms in terms of the accumulated waiting time of vehicles, network delay time, and delay rate indicators. The delay rate is reduced by 7.4%. Furthermore, the General- MARL algorithm optimizes each intersection to a certain extent, as shown in Table 5.
3 Results of the average speed and waiting time in each episode (consider network delay). |
Method | Average speed (km/h) | Average waiting time (s) |
Fixed-time | 10.15 | 166.71 |
Q learning(center) | 21.69 | 182.75 |
Q learning(edge) | 23.47 | 155.64 |
DQN | 24.94 | 132.70 |
A3C | 26.12 | 108.65 |
Nash-Q | 28.32 | 105.55 |
Nash-DQN | 30.86 | 106.37 |
MAAC | 27.61 | 116.81 |
General-MARL | 30.78 | 92.48 |
4 Results of accumulated time and network delay in each episode (consider network delay). |
Method | Accumulated time (s) | Network delay (s) | Delay rate |
Fixed-time | 38827.5 | 0.0 | 0.0% |
Q learning(center) | 45448.0 | 10809.7 | 23.8% |
Q learning(edge) | 35789.3 | 6522.9 | 18.2% |
DQN | 33789.9 | 8997.2 | 26.6% |
A3C | 31340.4 | 7792.1 | 24.9% |
Nash-Q | 30940.5 | 7536.9 | 24.4% |
Nash-DQN | 31994.6 | 7937.7 | 24.8% |
MAAC | 28940.7 | 6552.1 | 22.6% |
General-MARL | 26912.7 | 3264.5 | 12.2% |
5 Results of average waiting time at each intersection in each episode (considering network delay). |
ID | Fixed- time | Q-edge | DQN | A3C | Nash-Q | Nash- DQN | MAAC | General |
1 | 175.27 | 161.96 | 136.93 | 110.89 | 109.34 | 111.21 | 124.69 | 96.81 |
2 | 188.35 | 172.68 | 145.58 | 116.43 | 115.88 | 119.02 | 130.73 | 105.46 |
3 | 155.28 | 145.58 | 123.71 | 102.43 | 99.35 | 99.27 | 120.99 | 83.59 |
4 | 197.72 | 180.36 | 151.78 | 120.39 | 120.57 | 124.61 | 133.34 | 111.66 |
5 | 155.23 | 145.54 | 123.68 | 102.41 | 99.32 | 99.24 | 116.97 | 83.56 |
6 | 168.97 | 156.8 | 132.76 | 108.22 | 106.19 | 107.45 | 122.26 | 92.64 |
7 | 157.68 | 147.55 | 125.30 | 103.44 | 100.55 | 100.7 | 107.91 | 85.18 |
8 | 161.32 | 150.53 | 127.70 | 104.98 | 102.37 | 102.88 | 119.31 | 87.58 |
9 | 185.23 | 170.13 | 143.52 | 115.11 | 114.32 | 109.88 | 128.53 | 103.40 |
10 | 176.47 | 162.95 | 137.72 | 111.40 | 109.94 | 111.92 | 115.15 | 97.60 |
11 | 161.21 | 150.44 | 127.63 | 104.94 | 102.31 | 102.81 | 119.28 | 87.51 |
12 | 125.96 | 121.55 | 104.32 | 90.01 | 104.30 | 81.77 | 105.69 | 64.20 |
13 | 131.62 | 126.19 | 108.06 | 92.41 | 87.52 | 85.15 | 117.87 | 67.94 |
14 | 169.15 | 156.95 | 132.88 | 108.30 | 106.28 | 107.55 | 122.33 | 92.76 |
15 | 175.52 | 180.84 | 152.16 | 120.64 | 109.47 | 124.96 | 40.06 | 112.04 |
16 | 132.47 | 126.89 | 108.62 | 92.77 | 87.94 | 85.65 | 108.20 | 68.50 |
17 | 164.12 | 152.83 | 129.56 | 87.56 | 103.77 | 104.55 | 113.46 | 89.44 |
18 | 166.87 | 155.08 | 131.38 | 107.33 | 105.14 | 106.19 | 121.45 | 91.26 |
19 | 150.39 | 141.57 | 120.48 | 100.36 | 96.90 | 96.35 | 115.11 | 80.36 |
20 | 177.77 | 164.01 | 138.58 | 111.95 | 110.59 | 112.70 | 125.65 | 98.46 |
21 | 153.63 | 144.23 | 122.62 | 101.73 | 98.52 | 98.29 | 96.35 | 82.50 |
22 | 167.54 | 155.63 | 131.82 | 107.62 | 105.48 | 106.59 | 121.71 | 91.70 |
23 | 177.62 | 163.89 | 162.05 | 157.98 | 110.52 | 112.61 | 139.54 | 121.93 |
24 | 186.84 | 171.45 | 144.58 | 115.79 | 115.13 | 118.11 | 129.15 | 104.46 |
25 | 175.36 | 162.04 | 136.99 | 110.93 | 109.39 | 111.26 | 128.73 | 96.87 |
26 | 175.23 | 161.93 | 136.90 | 110.87 | 109.32 | 111.18 | 124.67 | 96.78 |
27 | 188.35 | 172.68 | 145.58 | 116.43 | 115.88 | 119.02 | 104.64 | 102.77 |
Overall analysis
The analysis of experimental results reveals that network delay has a significant impact on the actual application effect of the TSC algorithm. Although General-MARL showed good performance even when network delay was ignored, implementation of the method in urban TSC should take into account the influence of network latency for industrial deployment.
With the consideration of network delay, the General-MARL fully surpasses the baseline algorithms on average vehicle speed at intersection, average vehicle waiting time, vehicle cumulative waiting time, network delay time, and the indicators of delay rate. The average speed of vehicles increases by 23.2%, and the network latency decreases by 11.7%, as shown in Fig.12.
12 Comparative results among different algorithms. |
Full size|PPT slide
Conclusions
In this paper, we proposed an integrated and cooperative IoT architecture GCTCS and the General-MARL algorithm for multi-intersection traffic signal control. The results from the proposed framework and algorithm showed that the average speed of vehicles was increased by 23.2%, and the network latency was reduced by 11.7%, when compared with baseline algorithms. Our results proved that the application of MARL in urban traffic signal control needs to consider multiple factors, including the simulation environment, algorithm process, deployment architecture, and network delay. The results also validated the best performance of the proposed General-MARL. Therefore, GCTCS and General-MARL showed great potential in practical applications and theoretical contributions for multi-intersection traffic signal control with large-scale deployment in real road networks.
In traffic signal control, not only the vehicles' but also the pedestrians' behaviours
[ 52] , such as crossing times and waiting times are affected. In future studies, pedestrians will be included to cover all the road users at intersections. We will be able to further validate our proposed methodology based on real-life case studies because we have been invited by one city from China to deploy our GCTCS and algorithm in a test area.
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}