2026, Volume 35 Issue 1

  • Select all
  • In recent years, unmanned aerial vehicles (UAVs) cooperative path planning is attracting more and more research attention. For the multi-UAV cooperative path planning problem, the path planning problem in three-dimensional (3D) environment is transformed into an optimization problem by introducing the fitness function and constraints such as minimizing path length, maintaining a low and stable flight altitude, and avoiding threat zones. A multi-strategy hybrid grey wolf optimization (MSHGWO) algorithm is proposed to address this problem. Firstly, a chaotic Cubic mapping is introduced to initialize the grey wolf positions to make its initial position distribution more uniform. Secondly, an adaptive adjustment weight factor is designed, which can adjust the movement weight based on the rate of fitness value decrease within a unit Euclidean distance, thereby improving the quality of the population. Finally, an elite opposition-based learning strategy is introduced to improve the population diversity so that the population jumps out of the local optimum. Simulation results indicate that the MSHGWO is capable of generating constraint-compliant paths for each UAV in complex 3D environments. Furthermore, the MSHGWO outperforms other algorithms in terms of convergence speed and solution quality. Meanwhile, flight experiments were conducted to validate the path planning capability of MSHGWO in real-world obstacle environments, further demonstrating the feasibility of the proposed multi-UAV cooperative path planning approach.
  • In this paper, a practical method named linear active disturbance rejection control (LADRC) with adaptive tuning is proposed for attitude control of small-scale unmanned helicopter. The proposed method accounts for both external disturbances and internal dynamic uncertainties, as well as parameter deviations arising from parameter uncertainty, while maintaining a relatively small number of adjustable parameters. Furthermore, it addresses the limitation that conventional active disturbance rejection control methods cannot be rigorously analyzed for stability. The total disturbance of unmanned helicopter is estimated and compensated by designed LADRC. The introduction of adaptive control realizes online parameter tuning, which eliminates parameter deviation and further improves control precision. Moreover, it also provides a novel idea to prove the stability of controller, so that it can be analyzed by Lyapunov function. Finally, the anti-disturbance performance and effectiveness of proposed method are verified by numerical simulation.
  • To address real-time path planning requirements for multi-unmanned aerial vehicle (multi-UAV) collaboration in environments, this study proposes an improved multi-agent deep deterministic policy gradient algorithm with prioritized experience replay (PER-MADDPG). By designing a multi-dimensional state representation incorporating relative positions, velocity vectors, and obstacle distance fields, we construct a composite reward function integrating safe obstacle avoidance, formation maintenance, and energy efficiency for environment perception and multi-objective collaborative optimization. The prioritized experience replay mechanism dynamically adjusts sampling weights based on temporal difference (TD) errors, enhancing learning efficiency for high-value samples. Simulation experiments demonstrate that our method generates real-time collaborative paths in 3D complex obstacle environments, reducing training time by 25.3% and 16.8% compared to traditional MADDPG and multi-agent twin delayed deep deterministic policy gradient (MATD3) algorithms respectively, while achieving smaller path length variances among UAVs. Results validate the effectiveness of prioritized experience replay in multi-agent collaborative decision-making.
  • Robust cooperative unmanned aerial vehicle (UAV) formation in complex 3D environments is hampered by reward sparsity and inefficient collaboration. To address this, we propose context-aware relational agent learning (CORAL), a novel multi-agent deep reinforcement learning framework. CORAL synergistically integrates two modules: (1) a novelty-based intrinsic reward module to drive efficient exploration and (2) an explicit relational learning module that allows agents to predict peer intentions and enhance coordination. Built on a multi-agent Actor-Critic architecture, CORAL enables agents to balance self-interest with group objectives. Comprehensive evaluations in a high-fidelity simulation show that our method significantly outperforms state-of-the-art baselines like multi-agent deep deterministic policy gradient (MADDPG) and monotonic value function factorisation for deep multi-agent reinforcement learning (QMIX) in path planning efficiency, collision avoidance, and scalability.
  • In dynamic and uncertain reconnaissance missions, effective task assignment and path planning for multiple unmanned aerial vehicles (UAVs) present significant challenges. A stochastic multi-UAV reconnaissance scheduling problem is formulated as a combinatorial optimization task with nonlinear objectives and coupled constraints. To solve the non-deterministic polynomial (NP)-hard problem efficiently, a novel learning-enhanced pigeon-inspired optimization (L-PIO) algorithm is proposed. The algorithm integrates a Q-learning mechanism to dynamically regulate control parameters, enabling adaptive exploration–exploitation trade-offs across different optimization phases. Additionally, geometric abstraction techniques are employed to approximate complex reconnaissance regions using maximum inscribed rectangles and spiral path models, allowing for precise cost modeling of UAV paths. The formal objective function is developed to minimize global flight distance and completion time while maximizing reconnaissance priority and task coverage. A series of simulation experiments are conducted under three scenarios: static task allocation, dynamic task emergence, and UAV failure recovery. Comparative analysis with several updated algorithms demonstrates that L-PIO exhibits superior robustness, adaptability, and computational efficiency. The results verify the algorithm’s effectiveness in addressing dynamic reconnaissance task planning in real-time multi-UAV applications.
  • To address the challenge of achieving decentralized, scalable, and adaptive control for large-scale multiple unmanned aerial vehicle (multi-UAV) swarms in dynamic urban environments with obstacles and wind perturbations, we proposed a hybrid framework integrating adaptive reinforcement learning (RL), multi-modal perception fusion, and enhanced pigeon flock optimization (PFO) with curiosity-driven exploration to enable robust autonomous and formation control. The framework leverages meta-learning to optimize RL policies for real-time adaptation, fuses sensor data for precise state estimation, and enhances PFO with learned leader-follower dynamics and exploration rewards to maintain cohesive formations and explore uncertain areas. For swarms of 10–30 UAVs, it achieves 34% faster convergence, 61% reduced stability root mean square error (RMSE), 88% fewer collisions and 85.6%–92.3% success rates in target detection and encirclement, outperforming standard multi-agent RL, pure PFO, and single-modality RL. Three-dimensional trajectory visualizations confirm cohesive formations, collision-free maneuvers, and efficient exploration in urban search-and-rescue scenarios. Innovations include meta-RL for rapid adaptation, multi-modal fusion for robust perception, and curiosity-driven PFO for scalable, decentralized control, advancing real-world multi-UAV swarm autonomy and coordination.
  • Online three-dimensional (3D) path planning in dynamic environments is a fundamental problem for achieving autonomous navigation of unmanned aerial vehicles (UAVs). However, existing methods struggle to model traversable dynamic gaps, resulting in conservative and suboptimal trajectories. To address these challenges, this paper proposes a hierarchical reinforcement learning (RL) framework that integrates global path guidance, local trajectory generation, predictive safety evaluation, and neural network-based decision-making. Specifically, the global planner provides long-term navigation guidance, and the local module then utilizes an improved 3D dynamic window approach (DWA) to generate dynamically feasible candidate trajectories. To enhance safety in dense dynamic scenarios, the algorithm introduces a predictive axis-aligned bounding box (AABB) strategy to model the future occupancy of obstacles, combined with convex hull verification for efficient trajectory safety assessment. Furthermore, a double deep Q-network (DDQN) is employed with structured feature encoding, enabling the neural network to reliably select the optimal trajectory from the candidate set, thereby improving robustness and generalization. Comparative experiments conducted in a high-fidelity simulation environment show that the algorithm outperforms existing algorithms, reducing the average number of collisions to 0.2 while shortening the average task completion time by approximately 15%, and achieving a success rate of 97%.
  • This paper presents a hierarchical framework for distributed optimal formation transformation control of unmanned aerial vehicle (UAV) swarms in cluttered environments. The framework decouples the problem into high-level assignment and low-level motion planning. First, we introduced the distributed incremental Hungarian-based assignment (DIHBA) algorithm, a communication-efficient method that achieves globally optimal assignments without a central coordinator. For motion planning, a lightweight planner uses a pre-computed library of time-optimal, dynamically feasible trajectories, enabling rapid, safe and formation awareness online selection. Comprehensive simulations demonstrate that framework achieves globally optimal assignments for swarms of up to 200 UAVs with communication costs lower than conventional distributed Hungarian methods, while maintaining superior formation integrity during transformation in cluttered environments.
  • Unmanned aerial vehicles (UAVs) face the challenge of autonomous obstacle avoidance in complex, multi-obstacle environments. Behavior cloning offers a promising approach to rapidly acquire a learning policy from limited expert demonstrations. However, pure imitation learning inherently suffers from poor exploration and limited generalization, typically necessitating extensive datasets to train competent student policies. We utilize a cross-modal variational autoencoder (CM-VAE) to extract compact features from raw visual inputs and UAV states, which then feed into a policy network. We evaluated our approach in a simulated environment featuring a challenging circular trajectory with eight gate obstacles. The results demonstrate that the policy trained with pure behavior cloning consistently failed. In stark contrast, our DAgger-augmented behavior cloning method successfully traversed all gates without collision. Our findings confirm that DAgger effectively mitigates the shortcomings of behavior cloning, enabling the creation of reliable and sample-efficient navigation policies for UAVs.
  • This paper proposes a trajectory tracking control scheme for vertical/short take-off and landing (V/STOL) vehicles. Owing to their high number of controllable degrees of freedom and strong nonlinearity, the design of flight control systems for such vehicles presents considerable challenges, particularly in developing controllers capable of accurately tracking specified trajectories. Building on existing control strategies for various vehicle types, this study introduces an extended control framework tailored for V/STOL systems. The proposed scheme consists of two nested loops: an outer position control loop and an inner attitude control loop. The position loop employs a proportional-integral-derivative (PID) control algorithm, whereas the attitude loop utilizes an anti-saturation integral sliding mode control algorithm. This approach effectively alleviates the integral oversaturation issue inherent in conventional sliding mode methods and suppresses buffeting through a boundary layer technique. Simulation results demonstrate the efficacy of the proposed control strategy.