Meta-synthesis management framework of a complex project: A case study of the deck pavement project of the Hong Kong-Zhuhai-Macao Bridge

Qiang MAI; Xing-lin GAO; Shi AN; Ding LIU; Ming LIU

doi:10.15302/J-FEM-2018076

PDF(387 KB)

Front. Eng ›› 2018, Vol. 5 ›› Issue (1) : 4-16. DOI: 10.15302/J-FEM-2018076

RESEARCH ARTICLE

Systems Engineering Theory and Application - RESEARCH ARTICLE

Meta-synthesis management framework of a complex project: A case study of the deck pavement project of the Hong Kong-Zhuhai-Macao Bridge

Author information +

History +

Abstract

Innovative technology and deep uncertainty during the design and construction process of complex projects introduce great challenges to their organization and management. The traditional methods, represented in the project management body of knowledge (PMBOK) guide, can solve systematic problems; however, they cannot solve complex problems. Based on the management practice implemented in the deck pavement project of the Hong Kong-Zhuhai-Macao Bridge (HZMB), in this work, we propose a meta-synthesis management framework for a complex project from the perspective of the science of complexity. The method deems that the complexity of the project has the characteristic of being multi-scale both in the design phase and the construction phase. These problems can be classified into different categories, each of which requires a different strategy. As a result, it is first necessary to adopt the “exploration” strategy to reduce project complexity and to transform the deep uncertainty problems into systematic problems. Then, the “exploitation” strategy should be used to apply the PMBOK and other traditional methods to achieve the design and construction goals of the project and to improve its efficiency. More specifically, in the design phase of a complex project, the “innovative integration” process is used for the exploration of the new engineering technology and knowledge; then, the “functional integration” process is employed to define the system architecture, the interface relationship, the technical index, and other functions. In the construction phase, the “adaptive integration” process is used for the construction of the engineering organization system; next, the “efficient integration” process is employed to improve the actual construction performance. The meta-synthesis management framework proposed in this work reveals the multi-scale principle of solving complex problems in the management practice of a complex project, and develops the methodology of meta-synthesis.

Keywords

complex project / meta-synthesis management / Hong Kong-Zhuhai-Macao Bridge / exploration strategy / exploitation strategy

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Qiang MAI, Xing-lin GAO, Shi AN, Ding LIU, Ming LIU. Meta-synthesis management framework of a complex project: A case study of the deck pavement project of the Hong Kong-Zhuhai-Macao Bridge. Front. Eng, 2018, 5(1): 4‒16 https://doi.org/10.15302/J-FEM-2018076

1 1 Introduction

The functionality of modern cities relies heavily on infrastructure networks (INs), such as those for water, power, and transportation. These networks play crucial roles in improving economic prosperity and enabling the movement of resources (Carnevali et al., 2020; Xu and Chopra, 2022; Xie et al., 2023; Dui et al., 2024a). However, disasters such as earthquakes and typhoons can damage the components (nodes and links) of INs (Lu et al., 2018; Liu and Wang, 2021). With increasing concerns related to attacks on the cyber and physical components associated with networks, as well as the need to mitigate the effects of natural disasters on networks, a variety of postdisaster restoration countermeasures (e.g., monitoring, inspection, and repair) have been proposed in the literature (Fu et al., 2022; Ganganath et al., 2018; Wang et al., 2021; Yu et al., 2017; Dui et al., 2024b).

Resilience has emerged as a critical and desirable characteristic, and there is an abundance of detailed reviews and comparative analyses of qualitative frameworks, as well as quantitative metrics for studying resilience (Bai et al., 2021; Zuo, 2021; Si et al., 2020; Das et al., 2020; Ahmadian et al., 2020; Dui et al., 2024d). Moreover, two fundamental yet challenging issues in postdisaster restoration are repair sequence (Xu et al., 2022; Zhang et al., 2022; Wu et al., 2022; Du and Wu, 2022) and routing (Ruan et al., 2021; Allal et al., 2021; Díaz-Ramírez et al., 2014) decisions. Network resilience can be considered an optimization objective to recover network functionality within a finite repair time.

To increase network resilience, various models for emergency repair sequence decision-making have been proposed in the field of postdisaster restoration (Xu et al., 2022; Zhang et al., 2022; Wu et al., 2022; Du and Wu, 2022). For example, Xu et al. (2022) developed a deterministic formulation of the repair sequencing decision problem for postdisaster INs under uncertain conditions. They proposed a two-stage stochastic model to solve the problem effectively, utilizing a scenario generation and reduction method to generate a limited number of repair time scenarios. Moreover, they proposed a heuristic algorithm for managing large-scale disruptions and incorporated an efficient enumeration algorithm as a module for addressing small-scale disruptions. Zhang et al. (2022) studied the optimization of restoration schedules for damaged highway-bridge networks, focusing on maximizing a resilience measure associated with network travel time. They formulated restoration-scheduling problems as integer programs to determine optimal schedules. Wu et al. (2022) proposed a novel method, which is based on a heuristic search algorithm and graph theory, for devising generator start-up sequences to select a feasible scheme with maximal power generation for system restoration strategies. Finally, in the context of utility grid outages, Du and Wu (2022) presented a two-stage learning framework to identify optimal restoration strategies that improve system resilience.

In contrast to resilience-oriented repair sequence decision-making strategies, repair routing strategies focus on optimizing routes for repairing damage. Several repair routing strategies have been proposed for various applications (Ruan et al., 2021; Allal et al., 2021; Díaz-Ramírez et al., 2014; Dui et al., 2024c). For example, Ruan et al. (2021) studied an operational aircraft repair routing problem and developed an integer linear programming framework based on network flow. They also developed a reinforcement learning-based algorithm that considers factors such as flying hours, workforce capacity, and the number of take-offs between repair checks. Allal et al. (2021) developed a simulation-optimization approach for routing and scheduling wind turbine repairs in offshore wind farms, utilizing the ant colony system (ACS) algorithm. Díaz-Ramírez et al. (2014) proposed scheduling planning methods for airlines with a single fleet and single repair and crew base, addressing aircraft repair routing and crew scheduling problems. Dui et al. (2024c) integrated repair strategies into unmanned vehicle distribution networks while considering cascading failures and proposed a resilience optimization (RO) algorithm.

However, limited attention has been given to RO strategies for IN restoration with multiple crews (Dui et al., 2023). For a single crew, the RO strategy takes into account both repair sequence and routing decisions. However, for multiple crews, the focus is on developing a global optimal strategy that considers the coupling characteristics between each crew’s repair actions. Therefore, this paper adopts a two-layer decision-making architecture for enhancing resilience, determining optimal routes for multiple crews to repair damage, and determining the optimal sequence for each crew to repair damaged nodes and links.

In recent years, significant advancements have been made in the theory and practice of deep reinforcement learning (DRL) (Arulkumaran et al., 2017; Silver et al., 2017; Wu et al., 2021). DRL leverages deep neural networks (DNNs) to develop artificial agents capable of efficiently processing high-dimensional sensory inputs and leveraging past experiences to make informed decisions. One popular algorithm in this domain is the deep Q-network (DQN), which employs a value-function-based DRL approach to improve an agent’s decision-making capabilities (Arulkumaran et al., 2017). For repair sequence decisions, the DQN method effectively utilizes degradation and damage information to support optimal strategies (Fan et al., 2022). In light of managing multiple crews, Pinciroli et al. (2021) proposed a novel sequential decision problem formulation for the operation and repair optimization of wind turbines over a long-term horizon via DRL based on proximal policy optimization (PPO). Additionally, methods such as deep deterministic policy gradient (DDPG) and double-DQN have been utilized to develop optimal maintenance schedules and plans for large-scale systems while minimizing costs (Ogunfowora and Najjaran, 2023). However, in the context of RO strategies, it is essential to consider both the repair sequence and repair routing decisions. The repair routing decision must consider the delayed rewards of future repair actions. Monte Carlo tree search (MCTS) (Browne et al., 2012) can estimate the repair action value for each damage state within a search tree by employing Monte Carlo rollouts. To calculate long-delay rewards accurately, the search tree expands significantly through rollout simulations.

To address the dual-layer decision-making characteristics of RO between multiple crews and a single crew, this paper proposes a DRL framework for IN restoration. Within this framework, an RO agent uses an actor-critic neural network (ACNN) to guide the MCTS. The actor head of the ACNN focuses the MCTS search on high-probability actions, whereas the critical head evaluates damage states within the search tree. The paper’s motivations and contributions are as follows:

(1) An RO problem involving multiple crews for IN restoration is proposed, which combines repair sequences and routing decisions.

(2) The adoption of a DRL framework facilitates the global optimal RO strategy for resilience enhancement, considering the dual-layer decision-making between multiple crews and a single crew.

(3) An actor-critic MCTS (AC-MCTS) method uses an ACNN to guide the MCTS search. The ACNN effectively utilizes IN damage information, allowing the MCTS to determine optimal repair routes and actions for multiple crews.

The subsequent sections of this paper are organized as follows: Section 2 presents the RO problem formulation. Section 3 explains the DRL-based RO method. Section 4 presents a case study, and Section 5 concludes with closing remarks.

2 2 Problem formulation

2.1 2.1 Formulation of the infrastructure network and its damage and repair

2.1.1 2.1.1 Operation

An IN is represented as an undirected graph

G

, consisting of

N

nodes and

K

edges. The graph

G

is considered sparse

[K ≪ N (N - 1) / 2]

. It is also assumed to be connected, meaning that there is at least one path connecting any two nodes via a finite number of steps. To describe this IN, two matrices are needed: an adjacency matrix

A

and a matrix

L

representing the physical distances. The

N \times N

adjacency matrix

A

has elements

a_{i j}

equal to 1 if there is an edge connecting node

i

to node

j

and 0 otherwise. The matrix

L

contains elements

l_{i j}

representing the geographic distances between nodes

i

and

j

, even when there is no direct edge between

i

and

j

. The physical distance matrix

L

can be calculated using the latitudes and longitudes of the IN nodes.

The shortest path length

d_{i j}

between nodes

i

and

j

is defined as the smallest sum of physical distances among all possible paths in the IN from node

i

to node

j

. Once the adjacency matrix

A

and physical distance matrix

L

are known, they can be used to compute the matrix

D

representing the shortest path lengths. The efficiency

ε_{i j}

between nodes

i

and

j

can be expressed as follows:

(1)

ε_{i j} = w_{i} w_{j} \frac{1}{d_{i j}} ，

where

w_{i}

is the weight of node

i

, and in this paper, the node degree is adopted as the weight. When there is no path in the IN between

i

and

j

under localized attacks,

d_{i j} = + \infty

and

ε_{i j} = 0

. The average efficiency of

G

can be expressed as follows:

(2)

E (G) = \frac{1}{N (N - 1)} \sum_{i \neq j \in G} ε_{i j} .

2.1.2 2.1.2 Damage

Fig.1 shows an example of an IN and its corresponding graph

G

. The damage center and radius are randomly generated. The localized attack area is highlighted in pink in Fig.1. It is assumed that all nodes and edges within this pink area fail. The geographically localized edges in the IN are removed, and the pink nodes are isolated. Therefore, the state of the IN edges can be described by the current adjacency matrix

A_{d}

after the localized attack, and the state vector of the IN nodes can be described as follows:

Fig.1 Localized attacks.

Full size|PPT slide

(3)

S_{d}^{n o d e s} = [s_{1}^{n o d e}, s_{2}^{n o d e}, \dots, s_{i}^{n o d e}, \dots, s_{j}^{n o d e}, \dots, s_{N}^{n o d e}],

where

s_{i}^{node}

is 0 if node

i

is damaged and is 1 otherwise.

2.1.3 2.1.3 Repair

This paper proposes a comprehensive and actionable strategy to support repair activities in the IN. The strategy considers various factors, such as the repair team, repair time sequence, and repairing objects. The aim is to address the limitations of existing studies that focus primarily on repairing objects alone.

The repair activities are carried out in parallel by multiple repair crews. The mean time taken to repair a destroyed node

i

is denoted by MTTR_i. The time required to repair each edge can be calculated via the following equation:

(4)

R T_{i j} = l_{i j} / v_{e d g e},

where

v_{e d g e}

denotes the repair speed of the damage edge. The time to move from

i

j

can be expressed as follows:

(5)

M T_{i j} = d_{i j} / v_{m o v e},

where

v_{m o v e}

denotes the movement speed of the crew.

2.2 2.2 Resilience of infrastructure networks

Resilience is commonly used to assess the recovery performance of a system following localized attacks (Bai et al., 2021; Si et al., 2020; Das et al., 2020). This paper introduces the concept of resilience in INs to evaluate the effectiveness of restoration strategies, aiming to enhance network performance and reduce recovery time.

To quantitatively measure the resilience of an IN, it is vital to define a metric that represents the functional performance of the network. This metric is referred to as the figure of merit (FOM) within a widely adopted framework (Das et al., 2020). In this paper, the average efficiency

E (G)

is normalized as follows to define the FOM at time τ:

(6)

F O M (τ) = \frac{E_{τ} (G)}{E * (G)},

where

E * (G)

is the average efficiency before the occurrence of a localized attack event and where

E_{τ} (G)

is the average efficiency at time

τ

. Furthermore, this definition of the FOM can also use other performance parameters of the IN.

A localized attack event is assumed to occur at time

τ_{d}

, causing a significant degradation in the functional performance of the IN.

τ_{d}

is a random time, and the RO strategy should be developed on the basis of damage information and involve multiple crews (explained in Section 3). These crews execute the restoration strategy to recover the IN from the localized attack, after which the functional performance of the IN improves and reaches the FOM threshold

F O M_{t h r}

at time

τ_{r}

. The following metric of resilience is adopted to quantify the loss incurred by the IN due to a localized attack.

(7)

R L = \int_{τ_{d}}^{τ_{r}} [F O M_{t h r} - F O M (τ)] d τ .

2.3 2.3 Formulation of resilience optimization

The restoration of the IN under localized attacks is treated as an RO problem involving multiple crews. The recovery time and recovery level are the constraints, the resilience loss (RL) is the objective, and the repair actions are the basic decision variables. The goal of this RO problem is to minimize the RL with a finite recovery time and a predefined recovery level.

Thus, the RO problem with multiple crews can be expressed as follows:

(8)

m i n R L,

(9)

s . t . a_{t_{c}}^{*} \in Ω_{c} | c \in F, t_{c} \in T_{c},

(10)

a_{t}^{*} \in Ω | t \in T,

(11)

Ω_{c} \subset Ω, \forall c \in F,

(12)

\sum_{t_{c} \in T_{c}} τ (a_{t_{c}}^{*}) \leq τ_{r} - τ_{d} \leq τ_{t h r}, \forall c \in F,

(13)

F O M (τ_{r}) \geq F O M_{t h r},

where

c

is the crew number and

F = {1, 2, \dots, n_{cre}}

is the crew fleet set with

n_{cre}

crews,

t_{c}

is the repair time step of crew

c

T_{c} = {1, 2, \dots, T_{c}}

is the time step set during its own repair process,

a_{t_{c}}^{*}

represents the repair action of time step

t_{c}

, and

Ω_{c}

is the action sequence set of crew

c

. In Constraint (10),

t

is the repair time step of this crew fleet,

T = {1, 2, \dots, T}

is the fleet time step set,

a_{t}^{*}

represents the repair action of time step

t

, and

Ω

is the fleet action sequence set. Constraint (11) indicates the relationship between the fleet action sequence set

Ω

and each crew action sequence set

Ω_{c}

. In Constraint (12),

τ (a_{t_{c}}^{*})

represents the time interval to execute the action

a_{t_{c}}^{*}

, and

τ_{thr}

is the recovery time threshold. Constraint (12) forces the total repair time of the crew fleet to be lower than the threshold recovery time, whereas Constraint (13) forces the IN performance to reach the FOM threshold at time

τ_{r}

This RO problem is a decision-making problem, and the RO strategy consists of a sequence of repair actions. The multiple crews execute repair actions in this RO strategy to improve the IN performance and reach the FOM threshold at time

τ_{r}

. Different RO strategies have different values of RL and

τ_{r}

. The recovery time

τ_{r}

can be calculated according to Section 2.1.3, whereas RL can be calculated according to Eq. (7). The optimal RO strategy aims to minimize the values of RL and

τ_{r}

, which depend on the repair sequence and routing decisions.

3 3 Deep-reinforcement learning-based resilience optimization method

3.1 3.1 Deep reinforcement learning framework for resilience optimization

For the RO problem, a DRL framework is developed in this paper, as shown in Fig.2. The framework illustrates the multiple-crew repair process (MCRP), the multiple-crew time sequence, resilience and reward, and neural network training. The details of the four parts of the framework are explained in Sections 3.1.1–3.1.4.

Fig.2 Deep reinforcement learning framework for IN restoration.

Full size|PPT slide

3.1.1 3.1.1 Multiple crew repair process

Before a localized attack, the IN has an adjacency matrix

A

. Then, a localized attack is randomly initialized with a damage center and a damage radius at time

τ_{d}

. After that, the damage adjacency matrix

A_{d}

and node vector

S_{d}^{nodes}

are generated.

F O M (τ_{d})

can be calculated according to Eq. (6).

To recover the IN, a crew fleet with

n_{cre}

crews executes a sequence of repair actions during an MCRP. A two-layer decision-making architecture is developed for this MCRP, consisting of a fleet-level action sequence set and

n_{cre}

crew-level action sequence sets. The fleet-level action sequence set is defined as:

(14)

Ω = {a_{t | t_{c}}^{*}} | t = 1 t o T, c \in F, t_{c} \in T_{c} .

The crew-level action sequence set of crew c can be described as follows:

(15)

Ω_{c} = {a_{t_{c} | t}^{*}} | t_{c} = 1 t o T_{c}, t \in T .

The repair action is defined as

a_{t | t_{c}}^{*} = a_{t_{c} | t}^{*} = (i, j)

, which means that this crew c repairs the edge from the normal node

i

to node

j

. After repairing the edge, if node

j

is a damaged node, this crew then continues to repair this node; otherwise, it continues to execute its next action. At each time step

t

, an RO agent uses a search algorithm with an integrated ACNN to select a repair action

a_{t | t_{c}}^{*}

for crew

c

, which has just finished its last action.

The decision-making of this agent (more details of which are given in Section 3.2) can be described as follows:

(16)

f_{a g e n t} (S, A) = a^{*} = (i, j),

where

S = [S_{f}, S_{c}]

is the current repair state, with

S_{f}

representing the repair state executed by the crew fleet and

S_{c}

representing the repair state executed by crew c.

3.1.2 3.1.2 Multiple crew time sequences

For a fleet with

n_{c r e}

crews, its two-layer decision-making architecture also has a fleet time sequence and

n_{c r e}

crew time sequences during the MCRP. The time sequence set of crew

c

can be described as follows:

(17)

ϕ_{c} = {τ_{0}, τ_{1}, \dots, τ_{t_{c}}, \dots, τ_{T_{c}}},

where

τ_{t_{c}}

represents the time at which this crew finishes its action

a_{t_{c} | t}^{*} = (i, j)

and

τ_{0} = τ_{d}

. The parameter

τ_{t_{c}}

can be calculated as follows:

(18)

τ_{t_{c}} = τ_{t_{c} - 1} + τ (a_{t_{c} | t}^{*}),

where

τ (a_{t_{c} | t}^{*})

represents the time interval to execute action

a_{t_{c} | t}^{*}

, which consists of four situations. In the first situation, the crew is at node i, and node j is normal. In the second situation, the crew is at node

i

, and node

j

is damaged. In the third situation, the crew is at node

i^{'}

, and node

j

is normal. In the fourth situation, the crew is at node

i^{'}

, and node

j

is damaged. According to Eqs. (4) and (5),

τ (a_{t_{c} | t}^{*})

can be calculated as follows:

(19)

τ (a_{t_{c} | t}^{*}) = {\begin{matrix} \begin{matrix} R T_{i j} \\ R T_{i j} + {M T T R}_{j} \end{matrix} \\ \begin{matrix} {M T}_{i^{'} i} + R T_{i j} \\ {M T}_{i^{'} i} + R T_{i j} + {M T T R}_{j} \end{matrix} \end{matrix} .

The fleet time sequence can be described as follows:

(20)

ϕ = {τ_{0}, τ_{1}, \dots, τ_{t}, \dots, τ_{T}},

where

τ_{0} = τ_{d}

and where

τ_{t}

represents the time at which the crew fleet finishes action

a_{t | t_{c}}^{*} = a_{t_{c} | t}^{*} = (i, j)

3.1.3 3.1.3 Resilience and rewards

For action

a_{t | t_{c}}^{*} = a_{t_{c} | t}^{*} = (i, j)

, if node j is normal and edge

(i, j)

is damaged, then, after the repair of edge

(i, j)

, this edge is assumed to restart its operation. If both node j and edge

(i, j)

are damaged, then after the repair of edge

(i, j)

and node

j

, this edge and this node are assumed to restart their operation. Thus,

F O M (τ_{t})

is a step function according to Eq. (6). After finishing an MCRP, the RL can be calculated according to Eq. (7). The reward of this MCRP is defined by normalizing RL as follows:

(21)

r = \frac{\int_{τ_{d}}^{τ_{t h r}} F O M_{t h r} d τ}{\int_{τ_{d}}^{τ_{r}} [F O M_{t h r} - F O M (τ)] d τ} .

3.1.4 3.1.4 Actor-critic neural network training

For each time step

t

, the RO agent uses an ACNN

f_{θ} (S, A) = (p, v)

to provide the search guide for a variant of the MCTS algorithm

f_{ts} (S, p, v) = (π, a^{*})

, where

p

is a prior repair probability matrix to guide the tree search to select repair actions, whereas v is a normalization parameter to evaluate the reward of the current repair action selection. The search algorithm uses the parameters

p

and

v

to output a search probability

π

to select a high-quality repair action

a^{*}

. Furthermore, the parameters

p

and

v

are generated by the ACNN, and more details of this ACNN architecture are given in Section 3.2.

The ACNN training data for each time step t are stored as

(S_{t}, A, π_{t}, z_{t})

, where

z_{t} = α r + β Δ F O M_{t}

is the reward for the action

a_{t | t_{c}}^{*}

α

and

β

are empirical parameters, and

Δ F O M_{t} = F O M (τ_{t}) - F O M (τ_{t - 1})

. While the RO agent finishes an MCRP, one episode has also been finished, and the data

(S, A, π, z)

are sampled from all time steps of this MCRP. The DNN then uses the data

(S, A, π, z)

to adjust its parameters

θ

. The training targets maximize the similarity of the prior repair probability

p

to the tree search probability

π

and minimize the error between the prior value

v

and the reward

z

. Thus, the loss function is summed over the cross-entropy losses (CELs) and mean squared errors (MSEs) via gradient descent to adjust the parameters

θ

as follows:

(22)

f_{l o s s} = (z - v)^{2} - π^{T} l o g p + c_{L 2} {‖ θ ‖}^{2},

where

c_{L2}

is a constant that is used to control the L2 regularization level to prevent overfitting.

During each episode, a mini-batch of training samples is randomly selected from the data set containing all the episode data. The parameters

θ

are then adjusted via batch training. After adjusting the parameters

θ

, the ACNN enhances the RO agent’s decision-making capabilities for the next episode. The initial ACNN parameters

θ

are randomly generated. As the number of episodes increases, the trained ACNN provides stronger guidance for the tree search. After a sufficient number of episodes, the ACNN is well trained, and the MSEs and CELs stabilize. The RO agent can then effectively support decision-making to solve the RO problem for the IN under localized attacks. Furthermore, the ACNN does not need to be retrained during the IN’s operation phase when a localized attack occurs.

3.2 3.2 Decision-making of the agent

At each time step

t

of the MCRP, the RO agent executes an AC-MCTS to select a repair action

a_{t | t_{c}}^{*}

, guided by an ACNN.

3.2.1 3.2.1 Actor-critic Monte Carlo tree search algorithm

The AC-MCTS algorithm generates a search tree. In this search tree, each tree node

S

represents a repair state and contains tree edges

(S, a)

. Each tree edge

(S, a)

describes a legal repair action

a = (i, j)

in the legal repair action space

ξ (S)

. Each crew at the current repair state has its own

ξ (S)

, which contains the repair actions for damaged IN nodes and edges, except for the repair actions that other crews are executing.

Each tree edge

(S, a)

stores a set of statistics

{P (S, a), N (S, a), W (S, a), Q (S, a)}

, where

P (S, a)

represents the prior repair probability of selecting the tree edge

(S, a)

N (S, a)

represents the visit count of the tree edge

(S, a)

W (S, a)

represents the total repair action value and

Q (S, a)

represents the mean repair action value.

The AC-MCTS algorithm has four steps. Multiple simulations proceed via iteration over three steps, as shown in Fig.3. step (a)–(c). The last step selects a repair action, as shown in Fig.3. step d).

Fig.3 Actor-critic Monte Carlo tree search algorithm for the RO agent.

Full size|PPT slide

(1) Selection

The search tree executes the selection phase beginning at a root tree-node

S^{0}

and finishing at a leaf tree-node

S^{L}

. The current repair state is taken as the root tree node

S^{0}

. At each time step

t = l \in [0, L]

of this selection phase, the search tree selects a repair action as follows:

(23)

a^{l} = \underset{a^{l} \in ξ (S^{l})}{a r g m a x} [Q (S^{l}, a^{l}) + U (S^{l}, a^{l})],

(24)

U (S^{l}, a^{l}) = c_{p u c t} P (S^{l}, a^{l}) \frac{\sqrt{\sum_{b} N (S^{l}, b)}}{1 + N (S^{l}, a^{l})},

where

U (S^{l}, a^{l})

is an intermediate variable using a variant of the PUCT algorithm (Rosin, 2011),

c_{puct}

is a constant determining the selection level,

\sum_{b} N (S^{l}, b)

is the cumulative visit count of the tree node

S_{l}

, and

N (S^{l}, a^{l})

is the cumulative visit count of the tree edge

(S^{l}, a^{l})

(2) Expansion and evaluation

The search tree uses the ACNN

f_{θ} (S, A) = (p, v)

to expand and evaluate the leaf tree-node

S^{L}

. This ACNN takes the leaf tree-node

S^{L}

and the adjacency matrix

A

as inputs, and then the search tree expands the leaf tree-node

S^{L}

and stores the statistics according to the outputs of the ACNN. The prior repair probability matrix

p

of the ACNN output is an

N \times N

adjacency matrix whose element

p_{i j}

is the probability of selecting the repair action

a = (i, j)

. Generally, the search tree without the ACNN considers all of the legal tree edges

(S^{L}, a^{L})

to expand this leaf tree node, whereas the search tree with the ACNN uses the prior repair probabilities to select the tree edges, as shown in Fig.3. Step b. Each tree edge

(S^{L}, a^{L})

of the leaf tree node

S^{L}

is initialized as follows:

(25)

{N (S^{L}, a^{L}) = 0, W (S^{L}, a^{L}) = 0, Q (S^{L}, a^{L}) = 0, P (S^{L}, a^{L}) = p_{a^{L}}},

where

p_{a^{L}}

is the prior repair probability of selecting the tree edge

(S^{L}, a^{L})

and is a matrix element of the ACNN output

p

(3) Backpropagation

A backward pass is executed through each tree-node

S^{l}

in the above selection phase to update the tree-edge statistics. The statistics of the tree edge

(S^{l}, a^{l})

are incremented as follows:

(26)

N (S^{l}, a^{l}) = N (S^{l}, a^{l}) + 1,

(27)

W (S^{l}, a^{l}) = W (S^{l}, a^{l}) + v,

(28)

Q (S^{l}, a^{l}) = \frac{W (S^{l}, a^{l})}{N (S^{l}, a^{l})},

where

v

is the predicted value of the ACNN output and is a normalization parameter used to evaluate the reward of the current repair action selection.

(4) Repair

After multiple simulations proceed by iterating over the above three steps, a higher-quality repair action a* is selected by the fourth step, using a search probability

π

, which is an

N \times N

adjacency matrix whose element is the probability of selecting the repair action

a = (i, j)

. This search probability is proportional to the exponentiated visit count, as follows:

(29)

π (a | S^{0}) = \frac{N (S^{0}, a)^{1 / φ}}{\sum_{b} N (S^{0}, b)^{1 / φ}},

where

φ

is a temperature parameter used to control the exploration level. The repair action

a^{*}

is executed to transform at the root tree node

S^{0}

For the next time step

t + 1

of the MCRP, the subsequent tree search is executed, and the search tree is reused: the new root tree node is the child tree node corresponding to the last high-quality repair action

a_{t | t_{c}}^{*}

. The search tree retains the subtree below this child tree node along with statistics and discards the remainder of the search tree. In summary, the tree search can be expressed as

f_{ts} (S, p, v) = (π, a^{*})

3.2.2 3.2.2 Actor-critic neural network

The ACNN

f_{θ} (S, A) = (p, v)

is a double-input double-output neural network, as shown in Fig.2. At each time step

t

, the inputs of ACNN

f_{θ}

are the current repair state

S_{t} = [S_{f}^{t}, S_{c}^{t_{c}}]

and adjacency matrix

A

before damage. This ACNN processes the input features

S_{t}

via a residual module that consists of residual blocks (He et al., 2016) of convolutional layers with ReLU functions, whereas the input feature

A

is a constant matrix and is processed by a multilayer module that consists of fully connected layers.

To capture the features in the two-layer decision-making architecture, the current repair state

S_{t}

consists of a fleet-level repair state

S_{f}^{t}

and a crew-level repair state

S_{c}^{t_{c}}

S_{f}^{t}

can be written as follows:

(30)

S_{f}^{t} = [A_{f}^{t}, A_{f}^{t - 1}, \dots, A_{f}^{t - 5}],

where

A_{f}^{t}

is the adjacency matrix at time step t of the MCRP, and if

t < 1

, then

A_{f}^{t} = A_{f}^{1} = A_{damage}

. History adjacency matrices are necessary because the RO agent is not fully observable solely from the current adjacency matrix, since the MCRP is a sequence decision process. Moreover,

S_{c}^{t_{c}}

can be written as follows:

(31)

S_{c}^{t_{c}} = [A_{c}^{t_{c}}, A_{c}^{t_{c} - 1}, A_{c}^{t_{c} - 2}],

where

A_{c}^{t_{c}}

is the adjacency matrix at time step

t_{c}

of the crew c repair process, and if

t_{c} \lt 1

, then

A_{c}^{t_{c}} = A_{c}^{1} = A_{d}

The residual and multilayer modules consist of shared layers and merge into two distinct modules: the actor head and the critic head. Both the actor head and the critic head receive inputs from the shared layers. The inputs from the residual module are processed by residual convolutional layers and a fully connected layer, whereas the inputs from the multilayer module are processed by fully connected layers. Then, the actor head and critic head output prior repair probabilities

p_{t}

and values

v_{t}

via a fully connected layer and a sigmoid function.

4 4 Case study

Python is used to simulate the case study for the postdisaster restoration of the IN. The power grid data with 228 nodes and 612 edges are chosen as an example for the IN (Molnar et al., 2021). The power systems are represented as nodes, and the transmission lines between these nodes are represented as edges to construct the graph

G

of the power grid. The case study then examines the resilience and restoration of the power grid under large-scale damage scenarios, and the optimal RO strategies are generated on the basis of the proposed DRL framework. Furthermore, the proposed framework can be applied to other real-world IN scenarios by analyzing the distributed geographical characteristics of nodes and edges.

4.1 4.1 Postdisaster restoration analysis

4.1.1 4.1.1 Damage examples

Fig.4 shows four damage cases of the power grid

G

, with randomly generated attack centers and damage radii. The damage nodes are highlighted in pink. The destruction information for each damage case is shown in Tab.1.

Fig.4 Different repair processes with multiple crews.

Full size|PPT slide

Tab.1 Information of each damage case

Damage case	Damage center	Damage radius (km)	Damage number		Current FOM
Damage case	Damage center	Damage radius (km)	Nodes	Edges	Current FOM
Case 1	8.64810°W, 51.82235°N	188.59	62	186	0.496677
Case 2	9.19534°W, 50.28620°N	187.06	43	140	0.585140
Case 3	8.92320°W, 53.04213°N	182.39	47	140	0.627756
Case 4	11.22784°W, 48.82155°N	185.84	34	102	0.771439

The repair activities are conducted concurrently by multiple repair crews with varying repair capabilities. For this section, we consider five repair crews with different values of

v_{e d g e}

and

v_{m o v e}

. The nodes are categorized into three levels: first-level, second-level, and third-level nodes, with node weights w belonging to (0, 2], (2, 4], and (4, + ∞), respectively. Each repair crew has different MTTRs for the three levels of nodes: MTTR^1st, MTTR^2nd, and MTTR^3rd. The repair capabilities of each repair crew are shown in Tab.2. The repair capabilities can also be assigned on the basis of actual conditions. The four MCRPs in Fig.5 are executed by a crew fleet with Crews 1, 2, and 3.

Tab.2 Information of each repair crew

Parameter	Crew 1	Crew 2	Crew 3	Crew 4	Crew 5
MTTR^1st (h)	1	1.5	2	1.2	1.8
MTTR^2nd (h)	3.2	4.1	4.8	3.7	4.5
MTTR^3rd (h)	6.1	7.2	7.9	6.6	7.6
v_edge (km/h)	1.8	2	2.1	1.9	2
v_move (km/h)	50	55	60	52	58

Fig.5 MSEs (a) and CELs (b) of the ACNN.

Full size|PPT slide

4.1.2 4.1.2 Repair strategies and resilience analysis

For each damage case depicted in Fig.5, it is assumed that a localized attack event occurs at time

τ_{d}

= 200 h, resulting in a significant degradation in the functional performance of the power grid. The power grid subsequently undergoes repair actions to recover from the localized attack, leading to an improvement in functional performance. The functional performance reaches the threshold

F O M_{t h r}

= 0.96 at time

τ_{r}

. Generally, different MCRPs have different values of

τ_{r}

, and a resilience analysis of the four MCRPs is shown in Fig.5. Notably,

τ_{d}

is an assumed time and can be set to any random number. The FOM threshold is an empirical parameter and can be set on the basis of the actual situation. To facilitate a clear comparison of different values of

τ_{r}

, a deterministic time

τ_{d}

. is chosen for the case study. The RLs of the four MCRPs are shown in Fig.5. Tab.3 presents the RL and recovery time

τ_{r}

for each damage case.

Tab.3 Resilience loss and recovery time for each damage case

Damage case	Case 1	Case 2	Case 3	Case 4
Resilience loss RL	202.4516	145.8626	114.4251	62.3091
Recovery time τ_r (h)	984.123	829.632	779.455	649.462

4.1.3 4.1.3 Empirical analysis of agent training

The proposed method is applied to train the RO agent via a single machine equipped with one Intel i9 7980XE CPU and four RTX2080 TI-A11G GPUs. The training process begins with a completely random ACNN and continues for approximately 28 h without human intervention. During the training process, 10,000 MCRPs are executed, with 800 simulations for each MCTS, resulting in nearly 0.2 s of thinking time per repair action. The ACNN parameters are updated via mini-batches from the data set generated by the MCRPs.

Fig.5 shows the MSEs and CELs of the ACNN after each episode of the MCRP. The MSE measures the discrepancy between the actual outcome

z

and the ACNN value

v

, which predicts the actual optimization of the RL. The CEL indicates the similarity between the ACNN probabilities

p

and the MCTS policy

π

, which predicts the policy

π

. The hyperparameters used are as follows:

a = 2, b = 4, c_{L 2} = 10^{- 4}

, and

c_{p u c t} = 5

. Although the mathematical models of MSE and CEL differ, the ACNN is trained on the same training data set and loss function (Eq. (17)). The trends of variation in the two curves are similar in some episodes of training. After approximately 8000 MCRP episodes, the MSE and CEL tend to stabilize, enabling the ACNN to efficiently guide the MCTS in selecting repair actions.

4.2 4.2 Discussion

In the context of IN restoration, the primary objective is to obtain an optimal feasible solution. Extensive analyses of the optimization solution are conducted to assess the influences of different factors.

4.2.1 4.2.1 Different algorithms

An AC-MCTS method with high efficiency is proposed for IN postdisaster restoration in this paper. Additionally, a PPO method, a DDPG method, a double-DQN method, an ASC method, and a heuristic hybrid game (HHG) method (Feng et al., 2016) have been employed to address the same problem. Each of these six methods is analyzed separately by executing a Python program for each method to solve the four damage cases presented in Tab.2. The AC-MCTS method outperforms the other methods in terms of RL, as illustrated in Fig.6. In the initial MCRP stage for each damage case, the PPO method, the DDPG method, and the double-DQN method may achieve a higher FOM than the AC-MCTS method; however, these three methods may also lead to local optima, such as situations where multiple crews need to spend more time traveling during the middle and late MCRP stages. For certain cases, such as those depicted in Fig.6(a) and 6(c), the PPO method, DDPG method, and double-DQN method achieve better RL than the HHG method and the ASC method do, but the HHG method and the ASC method are able to complete their MCRP sooner than these three methods are. According to the findings presented in Fig.6, the AC-MCTS method results in superior RL results in less time than the other methods do because it uses the repair action value predicted by the ACNN and estimates the long-delayed reward of future repair actions in a search tree, thereby striving to achieve better RL for the entire MCRP. Therefore, the results demonstrate that the AC-MCTS method produces optimized strategies for postdisaster restoration.

Fig.6 Resilience loss of different methods for each damage case.

Full size|PPT slide

The efficiencies of the methods are analyzed in Tab.4. According to these results, the solution speeds of the AC-MCTS, PPO, DDPG, and double-DQN methods remain stable and do not exponentially decrease as the numbers of damaged nodes and edges increase. However, the solution speed of the ASC and HHG methods significantly decreases as the damage level increases.

Tab.4 Running times (in Seconds) of different methods

Method	Case 1	Case 2	Case 3	Case 4
AC-MCTS	97.672	89.243	90.187	84.587
PPO	95.647	87.952	90.012	83.245
DDPG	94.313	87.115	89.584	81.313
Double-DQN	94.845	87.558	89.876	82.545
ASC	312.647	237.564	232.842	208.549
HHG	325.953	246.476	249.295	212.377

4.2.2 4.2.2 Different numbers of crews

The same damage case (Damage Case 1 in Tab.2) is maintained, and variations in crew fleet sizes of three, four, and five are examined via six different methods. The sensitivity results for a three-crew fleet (Crews 1, 2, and 3) are displayed in Fig.6(a), whereas the results for a four-crew fleet (Crews 1, 2, 3, and 4) and a five-crew fleet (Crews 1, 2, 3, 4, and 5) are shown in Fig.7(a) and 7(b), respectively. RL and recovery time for different crew numbers are documented in Tab.5. As the number of crewmembers increases, the RL and total repair time decrease. The RL achieved by the RO agent is proven to be superior to that of other methods in terms of time efficiency. Thus, these findings demonstrate that the RO agent performs consistently in postdisaster restoration problems with varying crew sizes. Notably, the number of crews is generated randomly during the RO agent training process.

Fig.7 Comparison of resilience with different numbers of crews

Full size|PPT slide

Tab.5 Resilience loss and recovery time for different numbers of Crews

Crew fleet	Crews 1–3		Crews 1–4		Crews 1–5
Crew fleet	RL	τ_r (h)	RL	τ_r (h)	RL	τ_r (h)
AC-MCTS	202.4516	984.123	157.4468	779.136	133.5588	670.447
PPO	232.2561	1034.058	178.4199	840.131	139.7974	736.981
DDPG	235.7050	1053.520	168.7603	836.657	140.8523	719.631
Double-DQN	236.7415	1046.361	171.4111	835.263	143.9193	723.802
ASC	258.3306	1036.819	189.6965	863.517	158.3478	730.855
HHG	256.4923	1011.631	190.8799	850.423	162.4286	724.925

4.2.3 4.2.3 Different abilities of crews

In addition, focusing on the same damage case (Damage Case 1 in Tab.2) and employing the RO agent, we analyze performance differences when crew abilities vary. Resilience analyses for a fleet including Crews 1, 2, and 3 are shown in Fig.4 (MCRP for damage case 1 and its resilience analysis), whereas analyses for fleets comprising Crews 2, 3, and 4 and Crews 3, 4, and 5 are illustrated in Fig.8(a) and 8(b), respectively. RL and recovery time for varying crew abilities are summarized in Tab.6. With the same RO agent, the optimal strategies achieve similar RLs with nearly identical total repair times. These results demonstrate the consistent performance of the RO agents across crews with varying abilities.

Fig.8 Different abilities of crews: (a) Crews 2–4 (b) Crews 3–5.

Full size|PPT slide

Tab.6 Resilience loss and recovery time for different abilities of Crews

Crew fleet	Crews 1–3	Crews 2–4	Crews 3–5
Resilience loss RL	202.4516	206.9847	208.4678
Recovery time τ_r (h)	984.123	993.253	1001.279

5 5 Conclusions

The postdisaster restoration of INs has been a vital area of research in the recent past. In this paper, a well-formulated research problem is presented, taking into consideration the optimum sequence decision and optimum route selection issues for increasing the resilience of INs by engaging multiple crews. This paper presents a two-layer decision-making DRL framework of RO strategies with multiple crews, along with a solution methodology, namely, AC-MCTS, to effectively solve the problem. A case study is conducted in Python using the 228-node power grid as an exemplary IN to show that the proposed AC-MCTS method can generate a globally optimal RO strategy to minimize the overall RL for multiple crews. The AC-MCTS algorithm determines not only the optimum routes for repair but also the optimum decisions on the sequence of repairs. The results of the case study prove that, compared with other approaches, the AC-MCTS method yields better RL while reducing the time taken for repairs. Moreover, the proposed AC-MCTS method can support effective and consistent restoration decision-making in larger-scale INs. As the scale of the INs further increases, the training cost of the ACNN increases. Nevertheless, the proposed method can exploit distributed computing and parallel processing via languages such as C++, which can reduce training efficiency via several factors.

In the future, further research on optimal planning strategies for initial repair crews and resources is needed. Other important approaches to solve distributed policy optimization of the planning problem of multiple crews for IN restoration, such as multiagent DRL, are also worth investigating.

This is a preview of subscription content, contact us for subscripton.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Boddy S, Rezgui Y, Cooper G, Wetherill M (2007). Computer integrated construction: A review and proposals for future direction. Advances in Engineering Software, 38(10): 677–687 CrossRef Google scholar

[2]

Chai H F, Sun Q (2016). Research on the giant and complex financial information system engineering management from the perspective of Meta-synthesis methodology — With the bank card information exchange system engineering as a sample. Frontiers of Engineering Management, 3(4): 404–413

CrossRef Google scholar

[3]	Chang A, Chih Y Y, Chew E, Pisarski A (2013). Reconceptualising mega project success in Australian Defence: Recognising the importance of value co-creation. International Journal of Project Management, 31(8): 1139–1153 CrossRef Google scholar

[4]	Chen R G, Chen J M (2014). Research and practice of Meta-synthesis management for the government-led urban complex construction project. Frontiers of Engineering Management, 1(1): 52–61 CrossRef Google scholar

[5]	Dai R W, Cao L B (2002). Research of hall for workshop of metasynthetic engineering. Journal of Management Science in China, 5(3): 10–16 (in Chinese)

[6]	Davies A, Brady T (2016). Explicating the dynamics of project capabilities. International Journal of Project Management, 34(2): 314–327 CrossRef Google scholar

[7]	Davies A, Dodgson A M, Gann D (2010). From iconic design to lost luggage: Innovation at Heathrow Terminal 5. DRUID Working Papers, 1–35

[8]	Davies A, Dodgson M, Gann D (2016). Dynamic capabilities in complex projects: The case of London Heathrow Terminal 5. Project Management Journal, 47(2): 26–46 CrossRef Google scholar

[9]	Davies A, Gann D, Douglas T (2009). Innovation in megaprojects: Systems integration at London Heathrow Terminal 5. California Management Review, 51(2): 101–125 CrossRef Google scholar

[10]	Davies A, Mackenzie I (2014). Project complexity and systems integration: Constructing the London 2012 Olympics and Paralympics Games. International Journal of Project Management, 32(5): 773–790 CrossRef Google scholar

[11]	Drob C, Zichil V (2013). Overview regarding the main guidelines, standards and methodologies used in project management. Journal of Engineering Studies & Research, 19(3): 26–31

[12]	Eisenhardt K M, Furr N R, Bingham C B (2010). CROSSROADS—Microfoundations of performance: Balancing efficiency and flexibility in dynamic environments. Organization Science, 21(6): 1263–1273 CrossRef Google scholar

[13]	Flyvbjerg B, Bruzelius N, Rothengatter W (2003). Megaprojects and Risk: An Anatomy of Ambition. Cambridge: Cambridge University Press

[14]	Gao X L, Zhang M G, Fang M S, Dai J B (2016). Hong Kong-Zhuhai-Macao Bridge Project creative management practice. Journal of Chongqing Jiaotong University (Natural Science Edition), 35(S1): 12–26

[15]	Geraldi J, Maylor H, Williams T (2011). Now, let’s make it really complex (complicated): A systematic review of the complexities of projects. International Journal of Operations & Production Management, 31(9): 966–990 CrossRef Google scholar

[16]	Gu J F (2015). Collaborative innovation-meta-synthesis approach-meta-synthetic of wisdom. Journal of Systems Engineering, 30(2): 145–152 (in Chinese)

[17]	Gu J F, Tang X J (2002). Meta-synthesis and knowledge science. System Engineering Theory and Practice, (10): 2–7 (in Chinese)

[18]	Halfawy M M, Froese T M (2007). Component-based framework for implementing integrated architectural/engineering/construction project systems. Journal of Computing in Civil Engineering, 21(6): 441–452 CrossRef Google scholar

[19]	Hobday M, Davies A, Prencipe A (2005). Systems integration: A core capability of the modern corporation. Industrial and Corporate Change, 14(6): 1109–1143 CrossRef Google scholar

[20]	Hong W, Zhou J, Wu X L (2010). Analysis on integration management and Meta-synthesis management-Integration of large-scale project management. Project Management Technology, (11): 13–16 (in Chinese)

[21]	Howell D, Windahl C, Seidel R (2010). A project contingency framework based on uncertainty and its consequences. International Journal of Project Management, 28(3): 256–264 CrossRef Google scholar

[22]	Johnson S B (2003). Systems integration and the social solution of technical problems in complex systems. In: Prencipe A, Davies A, Hobday M, (eds.). The Business of Systems Integration. Oxford: Oxford University Press, 35–55.

[23]	Liang R, Sheng Z H (2015). Decision model for complex problem in major projects based on integration. China Soft Science, (11): 123–135 (in Chinese)

[24]	Liu H, Wang M J, Skibniewski M J, He J S, Zhang Z S (2014). Identification of critical success factors for construction innovation: From the perspective of strategic cooperation. Frontiers of Enginee-ring Management, 1(2): 202–209 CrossRef Google scholar

[25]	Locatelli G, Mancini M (2010). Risk management in a mega-project: The Universal EXPO 2015 case. International Journal of Project Organisation and Management, 2(3): 236–253 CrossRef Google scholar

[26]	Morris P (2013). Reconstructing project management reprised: A knowledge perspective. Project Management Journal, 44(5): 6–23 CrossRef Google scholar

[27]	Philbin S P (2015). Applying an integrated systems perspective to the management of engineering projects. Frontiers of Engineering Management, 2(1): 19–30 CrossRef Google scholar

[28]	Qian X S, Yu J Y, Dai R W (1990). A new discipline of science – The study of open complex giant system and its methodology. Journal of Systems Engineering and Electronics, 1993, 4(2): 2–12 (in Chinese)

[29]	Salet W, Bertolini L, Giezen M (2013). Complexity and uncertainty: Problem or asset in decision making of mega infrastructure projects? International Journal of Urban and Regional Research, 37(6): 1984–2000 CrossRef Google scholar

[30]	Sapolsky H M (2003). Inventing Systems Integration. In: Prencipe A, Davies A, Hobday M, (eds.). The Business of Systems Integration. Oxford: Oxford University Press, 15–34

[31]	Saynisch M (2010). Beyond frontiers of traditional project management: An approach to evolutionary, self-organizational principles and the complexity theory—results of the research program. Project Management Journal, 41(2): 21–37 CrossRef Google scholar

[32]	Scott W R, Levitt R E, Orr R J (2011). Global Projects: Institutional and Political Challenges. Cambridge: Cambridge University Press

[33]	Sheng Z H (2017). Fundamental Theories of Mega Infrastructure Construction Management–Theoretical Considerations from Chinese Practices. Springer

[34]	Sheng Z H, You Q Z, Li Q (2008). Methodology and method of large scale complex engineering management: Meta-synthesis management. Science & Technology Progress and Policy, 25(10): 193–197 (in Chinese)

[35]	Shenhar A J (1993). From low- to high- tech project management. R & D Management, 23(3): 199–214 CrossRef Google scholar

[36]	Shenhar A J (2001). One size does not fit all projects: Exploring classical contingency domains. Management Science, 47(3): 394–414 CrossRef Google scholar

[37]	Xu J P, Lu Y (2009). Meta-synthesis pattern of analysis and assessment of earthquake disaster system. System Engineering Theory and Practice, 29(11): 1–18 (in Chinese) CrossRef Google scholar

[38]	Yin R K (2013). Case study research: Design and methods. Thousand Oaks Sage publications

[39]	Yu J Y (2001). Qian Xuesen’ s contemporary system of science and technology and Meta-synthesis. Engineering and Science, 3(11): 10–18 (in Chinese)

[40]	Yu J Y, Tu Y J (2002). Meta-synthesis–Study of case. System Engineering Theory and Practice, (5): 31–35 (in Chinese)

[41]	Yu J Y, Zhou X J (2005). Meta syntheses: From the thought to its practice — The methodology, theory, technology and engineering. Chinese Journal of Management, 2(01): 4–10 (in Chinese)

RIGHTS & PERMISSIONS

2018 The Author(s) 2018. Published by Higher Education Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0)

AI Summary AI Mindmap

PDF(387 KB)

Part of a collection:

Systems Engineering Theory and Application

Accesses

Citations

Detail

Sections

Recommended

Abstract
Keywords
Cite this article
1 1 Introduction
2 2 Problem formulation
2.1 2.1 Formulation of the infrastructure network and its damage and repair
2.1.1 2.1.1 Operation
2.1.2 2.1.2 Damage
Fig.1 Localized attacks.
2.1.3 2.1.3 Repair
2.2 2.2 Resilience of infrastructure networks
2.3 2.3 Formulation of resilience optimization
3 3 Deep-reinforcement learning-based resilience optimization method
3.1 3.1 Deep reinforcement learning framework for resilience optimization
Fig.2 Deep reinforcement learning framework for IN restoration.
3.1.1 3.1.1 Multiple crew repair process
3.1.2 3.1.2 Multiple crew time sequences
3.1.3 3.1.3 Resilience and rewards
3.1.4 3.1.4 Actor-critic neural network training
3.2 3.2 Decision-making of the agent
3.2.1 3.2.1 Actor-critic Monte Carlo tree search algorithm
Fig.3 Actor-critic Monte Carlo tree search algorithm for the RO agent.
3.2.2 3.2.2 Actor-critic neural network
4 4 Case study
4.1 4.1 Postdisaster restoration analysis
4.1.1 4.1.1 Damage examples
Fig.4 Different repair processes with multiple crews.
Tab.1 Information of each damage case
Tab.2 Information of each repair crew
Fig.5 MSEs (a) and CELs (b) of the ACNN.
4.1.2 4.1.2 Repair strategies and resilience analysis
Tab.3 Resilience loss and recovery time for each damage case
4.1.3 4.1.3 Empirical analysis of agent training
4.2 4.2 Discussion
4.2.1 4.2.1 Different algorithms
Fig.6 Resilience loss of different methods for each damage case.
Tab.4 Running times (in Seconds) of different methods
4.2.2 4.2.2 Different numbers of crews
Fig.7 Comparison of resilience with different numbers of crews
Tab.5 Resilience loss and recovery time for different numbers of Crews
4.2.3 4.2.3 Different abilities of crews
Fig.8 Different abilities of crews: (a) Crews 2–4 (b) Crews 3–5.
Tab.6 Resilience loss and recovery time for different abilities of Crews
5 5 Conclusions
References
RIGHTS & PERMISSIONS

Received	Accepted	Published
12 Oct 2017	14 Dec 2017	21 Mar 2018
Just Accepted Date	Online First Date	Issue Date
03 Jan 2018	07 Feb 2018	21 Mar 2018

About the journal

Browse

Authors & reviewers

Abstract

Keywords

Cite this article

1 1 Introduction

2 2 Problem formulation

2.1 2.1 Formulation of the infrastructure network and its damage and repair

2.1.1 2.1.1 Operation

2.1.2 2.1.2 Damage

Fig.1 Localized attacks.

2.1.3 2.1.3 Repair

2.2 2.2 Resilience of infrastructure networks

2.3 2.3 Formulation of resilience optimization

3 3 Deep-reinforcement learning-based resilience optimization method

3.1 3.1 Deep reinforcement learning framework for resilience optimization

Fig.2 Deep reinforcement learning framework for IN restoration.

3.1.1 3.1.1 Multiple crew repair process

3.1.2 3.1.2 Multiple crew time sequences

3.1.3 3.1.3 Resilience and rewards

3.1.4 3.1.4 Actor-critic neural network training

3.2 3.2 Decision-making of the agent

3.2.1 3.2.1 Actor-critic Monte Carlo tree search algorithm

Fig.3 Actor-critic Monte Carlo tree search algorithm for the RO agent.

3.2.2 3.2.2 Actor-critic neural network

4 4 Case study

4.1 4.1 Postdisaster restoration analysis

4.1.1 4.1.1 Damage examples

Fig.4 Different repair processes with multiple crews.

Tab.1 Information of each damage case

Tab.2 Information of each repair crew

Fig.5 MSEs (a) and CELs (b) of the ACNN.

4.1.2 4.1.2 Repair strategies and resilience analysis

Tab.3 Resilience loss and recovery time for each damage case

4.1.3 4.1.3 Empirical analysis of agent training

4.2 4.2 Discussion

4.2.1 4.2.1 Different algorithms

Fig.6 Resilience loss of different methods for each damage case.

Tab.4 Running times (in Seconds) of different methods

4.2.2 4.2.2 Different numbers of crews

Fig.7 Comparison of resilience with different numbers of crews

Tab.5 Resilience loss and recovery time for different numbers of Crews

4.2.3 4.2.3 Different abilities of crews

Fig.8 Different abilities of crews: (a) Crews 2–4 (b) Crews 3–5.

Tab.6 Resilience loss and recovery time for different abilities of Crews

5 5 Conclusions

{{custom_sec.title}}

{{custom_sec.title}}

References

RIGHTS & PERMISSIONS