Robust Close-Range Air Combat Maneuver Decision-Making Method Based on Opponent Modeling and Reinforcement Learning
Hu Liu , Yixiong Yu , Yongliang Tian , Chuangyin Dang
Journal of Systems Science and Systems Engineering ›› : 1 -34.
Robust Close-Range Air Combat Maneuver Decision-Making Method Based on Opponent Modeling and Reinforcement Learning
In the context of one-versus-one close-range air combat involvingUnmanned CombatAerial Vehicles (UCAVs), existing Reinforcement Learning (RL) methods exhibit a significant trade-off between generalization performance and combat efficiency. Training strategies tailored to specific opponents enhance kill rates but suffer from limited generalization performance. Conversely, models approximating Nash Equilibrium through diversified opponent strategies achieve robust generalization but incur reduced efficiency due to training complexity and conservative decision-making. To address this challenge, this paper proposes a robust maneuver decision-making approach based on Opponent Modeling and Reinforcement Learning (OMRL). Grounded in the perspective of the Information Horizon, this approach categorizes opponent strategies into Long-Sighted Strategies, Short-Sighted Strategies, and Fixed Strategies. By employing a Long Short-TermMemory (LSTM) network, OMRL accurately classifies opponent trajectories and leverages the Proximal Policy Optimization algorithm to train targeted solution strategies, thereby constructing an efficient adversarial framework. OMRL dynamically identifies opponent strategy types in real time and invokes corresponding solution strategies, effectively balancing generalization performance and combat efficiency. Experimental results demonstrate that OMRL achieves an average win rate of 0.64 in testing, surpassing other state-of-the-art RL methods. This study represents the first to introduce the concept of Information Horizon-based classification, systematically analyzing the characteristics of various strategies, training an LSTM classifier with trajectory data, and developing the OMRL framework. Through adversarial experiments and ablation studies, the superiority and scalability of OMRL are validated, providing an innovative theoretical and practical foundation for efficient collaborative combat involving UCAVs.
Air combat / opponent modeling / reinforcement learning / self-play / nash equilibrium
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
Systems Engineering Society of China and Springer-Verlag GmbH Germany
/
| 〈 |
|
〉 |