Jun 2024, Volume 25 Issue 6
    

  • Select all
  • Review
    Weilin YUAN, Jiaxing CHEN, Shaofei CHEN, Dawei FENG, Zhenzhen HU, Peng LI, Weiwei ZHAO

    Reinforcement learning (RL) has become a dominant decision-making paradigm and has achieved notable success in many real-world applications. Notably, deep neural networks play a crucial role in unlocking RL’s potential in large-scale decision-making tasks. Inspired by current major success of Transformer in natural language processing and computer vision, numerous bottlenecks have been overcome by combining Transformer with RL for decision-making. This paper presents a multiangle systematic survey of various Transformer-based RL (TransRL) models applied in decision-making tasks, including basic models, advanced algorithms, representative implementation instances, typical applications, and known challenges. Our work aims to provide insights into problems that inherently arise with the current RL approaches, and examines how we can address them with better TransRL models. To our knowledge, we are the first to present a comprehensive review of the recent Transformer research developments in RL for decision-making. We hope that this survey provides a comprehensive review of TransRL models and inspires the RL community in its pursuit of future directions. To keep track of the rapid TransRL developments in the decision-making domains, we summarize the latest papers and their open-source implementations at https://github.com/williamyuanv0/Transformer-in-Reinforcement-Learning-for-Decision-Making-A-Survey.

  • Ziyang XING, Xiaoqiang DI, Hui QI, Jing CHEN, Jinhui CAO, Jinyao LIU, Xusheng LI, Zichu ZHANG, Yuchen ZHU, Lei CHEN, Kai HUANG, Xinghan HUO

    Information-centric satellite networks play a crucial role in remote sensing applications, particularly in the transmission of remote sensing images. However, the occurrence of burst traffic poses significant challenges in meeting the increased bandwidth demands. Traditional content delivery networks are ill-equipped to handle such bursts due to their pre-deployed content. In this paper, we propose an optimal replication strategy for mitigating burst traffic in information-centric satellite networks, specifically focusing on the transmission of remote sensing images. Our strategy involves selecting the most optimal replication delivery satellite node when multiple users subscribe to the same remote sensing content within a short time, effectively reducing network transmission data and preventing throughput degradation caused by burst traffic expansion. We formulate the content delivery process as a multi-objective optimization problem and apply Markov decision processes to determine the optimal value for burst traffic reduction. To address these challenges, we leverage federated reinforcement learning techniques. Additionally, we use bloom filters with subdivision and data identification methods to enable rapid retrieval and encoding of remote sensing images. Through software-based simulations using a low Earth orbit satellite constellation, we validate the effectiveness of our proposed strategy, achieving a significant 17% reduction in the average delivery delay. This paper offers valuable insights into efficient content delivery in satellite networks, specifically targeting the transmission of remote sensing images, and presents a promising approach to mitigate burst traffic challenges in information-centric environments.

  • Huifen XIA, Yongzhao ZHAN, Honglin LIU, Xiaopeng REN

    Temporal action localization (TAL) is a task of detecting the start and end timestamps of action instances and classifying them in an untrimmed video. As the number of action categories per video increases, existing weakly-supervised TAL (W-TAL) methods with only video-level labels cannot provide sufficient supervision. Single-frame supervision has attracted the interest of researchers. Existing paradigms model single-frame annotations from the perspective of video snippet sequences, neglect action discrimination of annotated frames, and do not pay sufficient attention to their correlations in the same category. Considering a category, the annotated frames exhibit distinctive appearance characteristics or clear action patterns. Thus, a novel method to enhance action discrimination via category-specific frame clustering for W-TAL is proposed. Specifically, the K-means clustering algorithm is employed to aggregate the annotated discriminative frames of the same category, which are regarded as exemplars to exhibit the characteristics of the action category. Then, the class activation scores are obtained by calculating the similarities between a frame and exemplars of various categories. Category-specific representation modeling can provide complimentary guidance to snippet sequence modeling in the mainline. As a result, a convex combination fusion mechanism is presented for annotated frames and snippet sequences to enhance the consistency properties of action discrimination, which can generate a robust class activation sequence for precise action classification and localization. Due to the supplementary guidance of action discriminative enhancement for video snippet sequences, our method outperforms existing single-frame annotation based methods. Experiments conducted on three datasets (THUMOS14, GTEA, and BEOID) show that our method achieves high localization performance compared with state-of-the-art methods.

  • Yang LI, Ziling WEI, Jinshu SU, Baokang ZHAO

    Multi-access edge computing (MEC) presents computing services at the edge of networks to address the enormous processing requirements of intelligent applications. Due to the maneuverability of unmanned aerial vehicles (UAVs), they can be used as temporal aerial edge nodes for providing edge services to ground users in MEC. However, MEC environment is usually dynamic and complicated. It is a challenge for multiple UAVs to select appropriate service strategies. Besides, most of existing works study UAV-MEC with the assumption that the flight heights of UAVs are fixed; i.e., the flying is considered to occur with reference to a two-dimensional plane, which neglects the importance of the height. In this paper, with consideration of the co-channel interference, an optimization problem of energy efficiency is investigated to maximize the number of fulfilled tasks, where multiple UAVs in a three-dimensional space collaboratively fulfill the task computation of ground users. In the formulated problem, we try to obtain the optimal flight and sub-channel selection strategies for UAVs and schedule strategies for tasks. Based on the multi-agent deep deterministic policy gradient (MADDPG) algorithm, we propose a curiosity-driven and twin-networks-structured MADDPG (CTMADDPG) algorithm to solve the formulated problem. It uses the inner reward to facilitate the state exploration of agents, avoiding convergence at the sub-optimal strategy. Furthermore, we adopt the twin critic networks for update stabilization to reduce the probability of Q value overestimation. The simulation results show that CTMADDPG is outstanding in maximizing the energy efficiency of the whole system and outperforms the other benchmarks.

  • Zhiyu DUAN, Shunkun YANG, Qi SHAO, Minghao YANG

    Epigenetics’ flexibility in terms of finer manipulation of genes renders unprecedented levels of refined and diverse evolutionary mechanisms possible. From the epigenetic perspective, the main limitations to improving the stability and accuracy of genetic algorithms are as follows: (1) the unchangeable nature of the external environment, which leads to excessive disorders in the changed phenotype after mutation and crossover; (2) the premature convergence due to the limited types of epigenetic operators. In this paper, a probabilistic environmental gradient-driven genetic algorithm (PEGA) considering epigenetic traits is proposed. To enhance the local convergence efficiency and acquire stable local search, a probabilistic environmental gradient (PEG) descent strategy together with a multi-dimensional heterogeneous exponential environmental vector tendentiously generates more offsprings along the gradient in the solution space. Moreover, to balance exploration and exploitation at different evolutionary stages, a variable nucleosome reorganization (VNR) operator is realized by dynamically adjusting the number of genes involved in mutation and crossover. Based on the above-mentioned operators, three epigenetic operators are further introduced to weaken the possible premature problem by enriching genetic diversity. The experimental results on the open Congress on Evolutionary Computation-2017 (CEC’ 17) benchmark over 10-, 30-, 50-, and 100-dimensional tests indicate that the proposed method outperforms 10 state-of-the-art evolutionary and swarm algorithms in terms of accuracy and stability on comprehensive performance. The ablation analysis demonstrates that for accuracy and stability, the fusion strategy of PEG and VNR are effective on 96.55% of the test functions and can improve the indicators by up to four orders of magnitude. Furthermore, the performance of PEGA on the real-world spacecraft trajectory optimization problem is the best in terms of quality of the solution.

  • Feng LI, Hao YANG, Qingfeng CAO

    A novel separation identification strategy for the neural fuzzy Wiener–Hammerstein system using hybrid signals is developed in this study. The Wiener–Hammerstein system is described by a model consisting of two linear dynamic elements with a nonlinear static element in between. The static nonlinear element is modeled by a neural fuzzy network (NFN) and the two linear dynamic elements are modeled by an autoregressive exogenous (ARX) model and an autoregressive (AR) model, separately. When the system input is Gaussian signals, the correlation technique is used to decouple the identification of the two linear dynamic elements from the nonlinear element. First, based on the input and output of Gaussian signals, the correlation analysis technique is used to identify the input linear element and output linear element, which addresses the problem that the intermediate variable information cannot be measured in the identified Wiener – Hammerstein system. Then, a zero-pole match method is adopted to separate the parameters of the two linear elements. Furthermore, the recursive least-squares technique is used to identify the nonlinear element based on the input and output of random signals, which avoids the impact of output noise. The feasibility of the presented identification technique is demonstrated by an illustrative simulation example and a practical nonlinear process. Simulation results show that the proposed strategy can obtain higher identification precision than existing identification algorithms.

  • Zhenyi ZHANG, Jie HUANG, Congjie PAN

    Reinforcement learning behavioral control (RLBC) is limited to an individual agent without any swarm mission, because it models the behavior priority learning as a Markov decision process. In this paper, a novel multi-agent reinforcement learning behavioral control (MARLBC) method is proposed to overcome such limitations by implementing joint learning. Specifically, a multi-agent reinforcement learning mission supervisor (MARLMS) is designed for a group of nonlinear second-order systems to assign the behavior priorities at the decision layer. Through modeling behavior priority switching as a cooperative Markov game, the MARLMS learns an optimal joint behavior priority to reduce dependence on human intelligence and high-performance computing hardware. At the control layer, a group of second-order reinforcement learning controllers are designed to learn the optimal control policies to track position and velocity signals simultaneously. In particular, input saturation constraints are strictly implemented via designing a group of adaptive compensators. Numerical simulation results show that the proposed MARLBC has a lower switching frequency and control cost than finite-time and fixed-time behavioral control and RLBC methods.

  • Yan WEI, Mingshuang HAO, Xinyi YU, Linlin OU

    This paper investigates the issue of adaptive optimal tracking control for nonlinear systems with dynamic state constraints. An asymmetric time-varying integral barrier Lyapunov function (ATIBLF) based integral reinforcement learning (IRL) control algorithm with an actor–critic structure is first proposed. The ATIBLF items are appropriately arranged in every step of the optimized backstepping control design to ensure that the dynamic full-state constraints are never violated. Thus, optimal virtual/actual control in every backstepping subsystem is decomposed with ATIBLF items and also with an adaptive optimized item. Meanwhile, neural networks are used to approximate the gradient value functions. According to the Lyapunov stability theorem, the boundedness of all signals of the closed-loop system is proved, and the proposed control scheme ensures that the system states are within predefined compact sets. Finally, the effectiveness of the proposed control approach is validated by simulations.