Journal of Systems Engineering and Electronics ›› 2021, Vol. 32 ›› Issue (6): 1421-1438.doi: 10.23919/JSEE.2021.000121

• SYSTEMS ENGINEERING • Previous Articles     Next Articles

UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning

Jiandong ZHANG1(), Qiming YANG1,*(), Guoqing SHI1(), Yi LU2(), Yong WU1()   

  1. 1 School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
    2 Shenyang Aircraft Design Institute, Shenyang 110035, China
  • Received:2020-12-22 Online:2022-01-05 Published:2022-01-05
  • Contact: Qiming YANG E-mail:jdzhang@nwpu.edu.cn;yangqm@nwpu.edu.cn;shiguoqing@nwpu.edu.cn;yiluemail@126.com;yongwu@nwpu.edu.cn
  • About author:|ZHANG Jiandong was born in 1974. He is an associate professor at the Department of System and Control Engineering in Northwestern Polytechnical University, China. He received both his M.S. and Ph.D. degrees in system engineering from the same university. His research interests include modeling simulation and effectiveness evaluation of complex systems, development and design of integrated avionics system, and system measurement & test technologies. E-mail: jdzhang@nwpu.edu.cn||YANG Qiming was born in 1988. He received his master degree from Northwestern Polytechnical University (NPU), Xi’an, China in 2013. He was awarded with a Ph.D. degree in electronic science and technology in 2020. He is an assistant researcher of the NPU. His main research interests are artificial intelligence and its application on control and decision of UAV. E-mail: yangqm@nwpu.edu.cn||SHI Guoqing was born in 1974. He is an associate professor at the Department of System and Control Engineering in Northwestern Polytechnical University, China. He received his M.S. and Ph.D. degrees in system engineering from the same university. His research interests include integrated avionics system measurement & test technologies, development and design of embedded real-time systems, modeling simulation and effectiveness evaluation of complex systems, etc. E-mail: shiguoqing@nwpu.edu.cn||LU Yi was born in 1975. He graduated from Nanjing University of Aeronautics and Astronautics in 1998, majoring in aircraft guidance control and simulation. He is currently the deputy chief designer of Shenyang Aircraft Design Institute, and mainly engaged in fighter avionics system design work. E-mail: yiluemail@126.com||WU Yong was born in 1964. He is a professor at the Department of System and Control Engineering in Northwestern Polytechnical University, China. He received his M.S. degree in system fire control from the same university in 1988. His research interests include integrated avionics system measurement & test technologies, development and design of embedded real-time systems, modeling simulation and effectiveness evaluation of complex systems, etc. E-mail: yongwu@nwpu.edu.cn
  • Supported by:
    This work was supported by the Aeronautical Science Foundation of China (2017ZC53033) and the Seed Foundation of Innovation and Creation for Graduate Students in Northwestern Polytechnical University (CX2020156).

Abstract:

In order to improve the autonomous ability of unmanned aerial vehicles (UAV) to implement air combat mission, many artificial intelligence-based autonomous air combat maneuver decision-making studies have been carried out, but these studies are often aimed at individual decision-making in 1v1 scenarios which rarely happen in actual air combat. Based on the research of the 1v1 autonomous air combat maneuver decision, this paper builds a multi-UAV cooperative air combat maneuver decision model based on multi-agent reinforcement learning. Firstly, a bidirectional recurrent neural network (BRNN) is used to achieve communication between UAV individuals, and the multi-UAV cooperative air combat maneuver decision model under the actor-critic architecture is established. Secondly, through combining with target allocation and air combat situation assessment, the tactical goal of the formation is merged with the reinforcement learning goal of every UAV, and a cooperative tactical maneuver policy is generated. The simulation results prove that the multi-UAV cooperative air combat maneuver decision model established in this paper can obtain the cooperative maneuver policy through reinforcement learning, the cooperative maneuver policy can guide UAVs to obtain the overall situational advantage and defeat the opponents under tactical cooperation.

Key words: decision-making, air combat maneuver, cooperative air combat, reinforcement learning, recurrent neural network