Robust Close-Range Air Combat Maneuver Decision-Making Method Based on Opponent Modeling and Reinforcement Learning

Hu Liu , Yixiong Yu , Yongliang Tian , Chuangyin Dang

Journal of Systems Science and Systems Engineering ›› : 1 -34.

PDF
Journal of Systems Science and Systems Engineering ›› :1 -34. DOI: 10.1007/s11518-025-5699-z
Article
research-article

Robust Close-Range Air Combat Maneuver Decision-Making Method Based on Opponent Modeling and Reinforcement Learning

Author information +
History +
PDF

Abstract

In the context of one-versus-one close-range air combat involvingUnmanned CombatAerial Vehicles (UCAVs), existing Reinforcement Learning (RL) methods exhibit a significant trade-off between generalization performance and combat efficiency. Training strategies tailored to specific opponents enhance kill rates but suffer from limited generalization performance. Conversely, models approximating Nash Equilibrium through diversified opponent strategies achieve robust generalization but incur reduced efficiency due to training complexity and conservative decision-making. To address this challenge, this paper proposes a robust maneuver decision-making approach based on Opponent Modeling and Reinforcement Learning (OMRL). Grounded in the perspective of the Information Horizon, this approach categorizes opponent strategies into Long-Sighted Strategies, Short-Sighted Strategies, and Fixed Strategies. By employing a Long Short-TermMemory (LSTM) network, OMRL accurately classifies opponent trajectories and leverages the Proximal Policy Optimization algorithm to train targeted solution strategies, thereby constructing an efficient adversarial framework. OMRL dynamically identifies opponent strategy types in real time and invokes corresponding solution strategies, effectively balancing generalization performance and combat efficiency. Experimental results demonstrate that OMRL achieves an average win rate of 0.64 in testing, surpassing other state-of-the-art RL methods. This study represents the first to introduce the concept of Information Horizon-based classification, systematically analyzing the characteristics of various strategies, training an LSTM classifier with trajectory data, and developing the OMRL framework. Through adversarial experiments and ablation studies, the superiority and scalability of OMRL are validated, providing an innovative theoretical and practical foundation for efficient collaborative combat involving UCAVs.

Keywords

Air combat / opponent modeling / reinforcement learning / self-play / nash equilibrium

Cite this article

Download citation ▾
Hu Liu, Yixiong Yu, Yongliang Tian, Chuangyin Dang. Robust Close-Range Air Combat Maneuver Decision-Making Method Based on Opponent Modeling and Reinforcement Learning. Journal of Systems Science and Systems Engineering 1-34 DOI:10.1007/s11518-025-5699-z

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Austin F, Carbone G, Falco M, Hinz H, Lewis M. Game theory for automated maneuvering during air-to-air combat. Journal Of Guidance, Control, And Dynamics, 1990, 13(6): 1143-1149

[2]

Baykal Y, Baspinar B. An evolutionary reinforcement learning approach for autonomous maneuver decision in one-to-one short-range air combat. 2023 IEEE/AIAA 42nd Digital Avionics Systems Conference (DASC), 202319

[3]

Cao Y, Kou Y X, Li ZW, Xu A. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. International Journal of Aerospace Engineering, 2023, 2023: 3657814

[4]

Chen L, Jiang Z, Cheng L, Knoll A C, Zhou M. Deep reinforcement learning based trajectory planning under uncertain constraints. Frontiers in Neurorobotics, 2022, 16: 883562

[5]

Chen C, Song T, Mo L, Lv M, Lin D. Autonomous dogfight decision-making for air combat based on reinforcement learning with automatic opponent sampling. Aerospace, 2025, 12(3): 265

[6]

Cruz J B, Simaan M A, Gacic A, Jiang H, Letelliier B, Li M, Liu Y. Game-theoretic modeling and control of a military air operation. IEEE Transactions on Aerospace and Electronic Systems, 2001, 37(41393-1405

[7]

CzarneckiWM G G, Tracey B, Tuyls K, Omidshafiei S, Balduzzi D, Jaderberg M. Real world games look like spinning tops. Proceedings of the 34th International Conference on Neural Information Processing Systems, 2020, Red Hook, NY, USA, Curran Associates Inc.12

[8]

Dong Y, Ai J, Liu J. Guidance and control for own aircraft in the autonomous air combat: A historical review and future prospects. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering, 2019, 233(165943-5991

[9]

Dong H, Ding Z, Zhang S, Yuan H, Zhang H, Zhang J, Huang Y, Yu T, Zhang H, Huang R. Deep Reinforcement Learning: Fundamentals, Research, and Applications, 2020

[10]

Fan Z, Xu Y, Kang Y, Luo D. Air combat maneuver decision method based on A3C deep reinforcement learning. Machines, 2022, 10(11): 1033

[11]

Heinrich J, Lanctot M, Silver D. Fictitious self-play in extensive-formgames. Proceedings of the 32nd International Conference on Machine Learning, 2015, 37: 805-813

[12]

Hu T, Hu J, Zhao C, Pan Q. Autonomous decision making ofUAVin short-range air combat based onDQN aided by expert knowledge. Proceedings of 2022 International Conference on Autonomous Unmanned Systems, ICAUS 2022, 2023, 1010: 1661-1670

[13]

Isaacs R. Games of pursuit. RAND Corporation Technical Report, 1951257

[14]

Jung H, Kim Y-D, Kim Y. Maneuver-conditioned decision transformer for tactical in-flight decision-making. IEEE Robotics and Automation Letters, 2024, 9(65322-5329

[15]

Lee B-Y, Han S, Park H-J, Yoo D-W, Tahk M-J. One-versus-one air-to-air combat maneuver generation based on differential game. Proceedings of the 2016 Congress of the International Council of the Aeronautical Sciences, 201617

[16]

Leslie D S, Collins E J. Generalised weakened fictitious play. Games and Economic Behavior, 2006, 56(2): 285-298

[17]

McGrew J S. Real-time maneuvering decisions for autonomous air combat. PhD thesis, 2008

[18]

McGrew J S, How J P, Williams B, Roy N. Air-combat strategy using approximate dynamic programming. Journal of Guidance, Control, and Dynamics, 2010, 33(5): 1641-1654

[19]

Pope A P, Ide J S, Miæoviæ D, Diaz H, Twedt J C, Alcedo K, Walker T T, Rosenbluth D, Ritholtz L, Javorsek D. Hierarchical reinforcement learning for air combat at DARPA’s AlphaDogfight trials. IEEE Transactions on Artificial Intelligence, 2023, 4(6): 1371-1385

[20]

Qian C, Zhang X, Li L, Zhao M, Fang Y. H3E: Learning air combat with a three-level hierarchical framework embedding expert knowledge. Expert Systems with Applications, 2024, 245: 123084

[21]

Ramirez L N, Żbikowski R. Effectiveness of autonomous decision making for unmanned combat aerial vehicles in dogfight engagements. Journal of Guidance, Control, and Dynamics, 2018, 41(4): 1021-1024

[22]

Raffin A, Hill A, Gleave A, Kanervisto A, Ernestus M, Dormann N. Stable-Baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 2021, 22(2681-8

[23]

Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms, 2017

[24]

Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, Hassabis D. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 2018, 362(6419): 1140-1144

[25]

Srivastava K, Surana A. Monte Carlo tree search based tactical maneuvering, 2020

[26]

Sun Z, Piao H, Yang Z, Zhao Y, Zhan G, Zhou D, Meng G, Chen H, Chen X, Qu B, Lu Y. Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play. Engineering Applications of Artificial Intelligence, 2021, 98: 104112

[27]

Vinyals O, Babuschkin I, Czarnecki W M, Mathieu M, Dudzik A, Chung J, Choi D H, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, HuangA S L, Cai T J P, Jaderberg M, Vezhnevets A S, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine T L, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, Wünsch D, McKinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, 575(7782): 350-354

[28]

Virtanen K, Karelahti J, Raivio T. Modeling air combat by a moving horizon influence diagram game. Journal of Guidance, Control, and Dynamics, 2006, 29(51080-1091

[29]

Wang Y, Huang C, Tang C. Research on unmanned combat aerial vehicle robust maneuvering decision under incomplete target information. Advances in Mechanical Engineering, 2016, 8(10): 1-12

[30]

Wang Z, Li H, Wu H, Wu Z. Improving maneuver strategy in air combat by alternate freeze games with a deep reinforcement learning algorithm. Mathematical Problems in Engineering, 2020, 2020: 7180639

[31]

Wang X Y, Su X, Wang L, Lu C, Peng H, Liu J. Deep reinforcement learning-based air combat maneuver decision-making: Literature review, implementation tutorial and future direction. Artificial Intelligence Review, 2023, 57(1): 1

[32]

Yang M, Shan S, Zhang W. Decision-making and confrontation in close-range air combat based on reinforcement learning. Chinese Journal of Aeronautics, 2025103526

[33]

Zhang H, Fan L, Chen M, Qiu C. The impact of SIPOC on process reengineering and sustainability of enterprise procurement management in e-commerce environments using deep learning. Journal of Organizational and End User Computing, 2022, 34(81-17

[34]

Zhang P, Dong W, Cai M, Li D, Zhang X. Learning and fast adaptation for air combat decision with improved deep meta-reinforcement learning. International Journal of Aeronautical and Space Sciences, 2025, 26: 1692-1707

[35]

Zhang X, Dong W, Zhang P, Li D. Research on the reward design method for deep reinforcement learning in WVR air combat. IEEE Access, 2024, 12: 182693-182707

[36]

Zheng Z, Duan H. UAV maneuver decision-making via deep reinforcement learning for short-range air combat. Intelligence & Robotics, 2023, 3(176-94

[37]

Zhong L, Zhao L, Ding C, Ge X, Chen J, Zhang Y, Zhang L. Vision-based 3D aerial target detection and tracking for maneuver decision in close-range air combat. IEEE Access, 2022, 10: 4157-4168

[38]

Zhu J, Kuang M, Zhou W, Shi H, Zhu J, Han X. Mastering air combat game with deep reinforcement learning. Defence Technology, 2024, 34: 295-312

RIGHTS & PERMISSIONS

Systems Engineering Society of China and Springer-Verlag GmbH Germany

PDF

25

Accesses

0

Citation

Detail

Sections
Recommended

/