Journal of Systems Engineering and Electronics ›› 2026, Vol. 37 ›› Issue (2): 616-635.doi: 10.23919/JSEE.2026.000079

• SYSTEMS ENGINEERING • Previous Articles    

Application of self-play reinforcement learning and explainable decision tree in intelligent air combat

Jingbo WANG1,2(), Liaoyuan ZHU2(), Shaojie XIA2(), Huibin LIU2(), Jing LIU2(), Chongxiao QU2,*(), Zhihuan SONG1()   

  1. 1College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
    2The 52nd Research Institute of China Electronics Technology Group Corporation, Hangzhou 311100, China
  • Received:2024-01-08 Online:2026-04-18 Published:2026-04-30
  • Contact: Chongxiao QU E-mail:wangjingbo2@cetc.com.cn;zhuliaoyuan1@cetc.com.cn;xiashaojie@cetc.com.cn;liuhuibin@cetc.com.cn;liujing2@cetc.com.cn;quchongxiao@cetc.com.cn;songzhihuan@zju.edu.cn
  • About author:
    WANG Jingbo was born in 1995. He received his Ph.D. degree in control science and engineering from Zhejiang University, Hangzhou, China, in 2022. He is currently an engineer in the 52nd Research Institute of China Electronics Technology Group Corporation, and a Postdoctoral Research Fellow with the College of Control Science and Engineering, Zhejiang University. His research interests include intelligent air combat, and deep reinforcement learning. E-mail: wangjingbo2@cetc.com.cn

    ZHU Liaoyuan was born in 1992. He received his M.Eng. degree in control engineering from Zhejiang University, Hangzhou, China, in 2017. He is currently an engineer in the 52nd Research Institute of China Electronics Technology Group Corporation. His research interests include intelligent air combat, and deep reinforcement learning. E-mail: zhuliaoyuan1@cetc.com.cn

    XIA Shaojie was born in 1989. He received his M.S. degree in control science and engineering from Zhejiang University of Technology, Hangzhou, China, in 2015. He is currently a senior engineer in the 52nd Research Institute of China Electronics Technology Group Corporation. His research interests include autonomous air combat, intelligent decision, and deep reinforcement learning. E-mail: xiashaojie@cetc.com.cn

    LIU Huibin was born in 1992. He received his M.Eng. degree in computer technology from Beijing University of Posts and Telecommunications, Beijing, China, in 2021. He is currently an assistant engineer in the 52nd Research Institute of China Electronics Technology Group Corporation. His research interests include air combat simulation and deep reinforcement learning. E-mail: liuhuibin@cetc.com.cn

    LIU Jing was born in 1996. She received her M.S. degree in control science and engineering from Harbin Engineering University, Harbin, China, in 2022. She is currently an assistant engineer in the 52nd Research Institute of China Electronics Technology Group Corporation. Her research interests include air combat simulation, and deep reinforcement learning. E-mail: liujing2@cetc.com.cn

    QU Chongxiao was born in 1985. He received his M.S. degree in business administration from Zhejiang University in 2014. He is currently a Ph.D. candidate in electronic and information engineering from University of Electronic Science and Technology of China, and a professorate senior engineer in the 52nd Research Institute of China Electronics Technology Group Corporation. His research interests include autonomous air combat, intelligent decision, and deep reinforcement learning. E-mail: quchongxiao@cetc.com.cn

    SONG Zhihuan was born in 1962. He received his B.Eng. and M.Eng. degrees in industrial automation from Hefei University of Technology, Hefei, China, in 1983 and 1986, respectively, and Ph.D. degree in industrial automation from Zhejiang University, Hangzhou, China, in 1997. Since 1997, He has been with the College of Control Science and Engineering, Zhejiang University, where he is currently a professor. His research interests include intelligent decision, machine learning, reinforcement learning, analytics and applications of big data, and advanced process control technologies. E-mail: songzhihuan@zju.edu.cn
  • Supported by:
    This work was supported by the Joint Funds of the National Natural Science Foundation of China (U2341216).

Abstract:

Deep reinforcement learning algorithms are revolutionizing intelligent decision-making in air combat, drawing widespread attention and extensive research. However, air combat agents trained with these algorithms face significant challenges, such as limited decision-making capacities due to adversarial training against relatively fixed and singular expert strategies, and a lack of interpretability and reliability in their decision-making processes. To tackle these issues, this paper proposes a self-play training mechanism based on policy switching and opponent selection, allowing air combat agents to refine their capabilities via engaging with previous versions of themselves. Additionally, an explainable decision tree model is developed to clarify the decision logic of these agents. Simulations and results demonstrate that the proposed self-play training approach significantly enhances the decision-making abilities of air combat agents, with late-stage agents showing a 38% improvement over early-stage agents in confrontations with an expert strategy. Moreover, the explainable decision tree model effectively elucidates the decision logic and achieves an 86% win rate against the expert strategy, comparable to the 88% win rate of the air combat agents.

Key words: deep reinforcement learning, intelligent air combat, self-play training, explainable decision tree