Journal of Systems Engineering and Electronics ›› 2025, Vol. 36 ›› Issue (6): 1692-1708.doi: 10.23919/JSEE.2025.000112

• CONTROL THEORY AND APPLICATION • Previous Articles    

λ-return-based aircraft maneuvering for terminal defense and positioning guidance strategies

Shijie DENG1,*(), Yingxin KOU1(), Maolong LYU2(), Zhanwu LI1(), An XU1()   

  1. 1 Aeronautics Engineering School, Air Force Engineering University, Xi’an 710038, China
    2 Air Traffic Control and Navigation College, Air Force Engineering University, Xi’an 710038, China
  • Received:2024-03-06 Online:2025-12-18 Published:2026-01-07
  • Contact: Shijie DENG E-mail:dengshijiesilence@163.com;kgykyx@hotmail.com;maolonglv@163.com;afeulzw@189.cn;18157494594@163.com
  • About author:
    DENG Shijie was born in 1997. He received his M.S. degree in armament science and technology from Air Force Engineering University, Xi’an, China in 2021. He is currently pursuing his Ph.D. degree at the School of Aeronautical Engineering, Air Force Engineering University. His research interests include the application of reinforcement learning in air combat maneuver decision-making, air combat decision-making cognition and environmental model learning based on artificial neural networks. Additionally, he has also dabbled in research fields such as airborne firepower command and control, missile guidance, and hyperspectral data target detection. E-mail: dengshijiesilence@163.com

    KOU Yingxin was born in 1965. He received his M.S. and Ph.D. degrees from Air Force Engineering University, Xi’an, China, in 1997 and 2010, respectively. He is currently a professor with the Aeronautics Engineering School, Air Force Engineering University. His research interests include air combat, nonlinear and adaptive control, artificial intelligence, multi-sensor data fusion, and optimized target assignment. E-mail: kgykyx@hotmail.com

    LYU Maolong was born in 1993. He received his B.S. degree in electrical engineering and automation and M.S. degree in control science and engineering from Air Force Engineering University, Xi’an, China, in 2014 and 2016, respectively, and Ph.D. degree in intelligent control from the Delft Center for Systems and Control, Delft University of Technology, Delft, the Netherlands, in 2021. He is currently an associate professor with the College of Air Traffic Control and Navigation, Air Force Engineering University. His research interests include adaptive learning control, deep reinforcement learning, and intelligent decision with applications in multiagent systems, hypersonic flight, vehicles, unmanned autonomous system. E-mail: maolonglv@163.com

    LI Zhanwu was born in 1978. He received his M.S. degree from Air Force Engineering University, Xi’an, China, in 2007. He is currently a professor with the Aeronautics Engineering School, Air Force Engineering University. His research mainly focuses on multi-sensor data fusion, machine intelligent and artificial intelligence. E-mail: afeulzw@189.cn

    XU An was born in 1984. He received his M.S. and Ph.D. degrees in armament science and technology from Air Force Engineering University, Xi’an, China, in 2009 and 2012, respectively. He is currently a postdocotoral researcher at the Electronic Information College of Northwestern Polytechnical University, Xi’an, China, as well as an associate professor with the Aeronautics Engineering School, Air Force Engineering University. His research interests include nonlinear and adaptive control, artificial intelligence and pattern recognition. E-mail: 18157494594@163.com

Abstract:

Aiming at the terminal defense problem of aircraft, this paper proposes a method to simultaneously achieve terminal defense and seize the dominant position. The method employs a λ-return based reinforcement learning algorithm, which can be applied to the flight assistance decision-making system to improve the pilot’s survivability. First, we model the environment to simulate the interaction between air-to-air missiles and aircraft. Subsequently, we propose a λ-return based approach to improve the deep Q learning network (DQN), deep advantageous actor criticism (A2C), and proximity policy optimization (PPO) algorithms used to train manoeuvre strategies. The method employs an action space containing nine manoeuvres and defines the off-target distance at the end of the scene as a sparse reward for algorithm training. Simulation results show that the convergence speed of the three improved algorithms is significantly improved when using the λ-return method. Moreover, the effect of the fetch value on the convergence speed is verified by ablation experiments. In order to solve the illegal behavior problem in the training process, we also design a backtracking-based illegal behavior masking mechanism, which improves the data generation efficiency of the environment model and promotes effective algorithm training.

Key words: λ-return, terminal defense, positioning guidance, reinforcement learning