Journal of Systems Engineering and Electronics ›› 2022, Vol. 33 ›› Issue (5): 1173-1185.doi: 10.23919/JSEE.2022.000113

• CONTROL THEORY AND APPLICATION • Previous Articles     Next Articles

Hierarchical reinforcement learning guidance with threat avoidance

Bohao LI1,2,3(), Yunjie WU1,2,3, Guofei LI4,*()   

  1. 1 State Key Laboratory of Virtual Reality Technology and System, Beihang University, Beijing 100191, China
    2 School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
    3 Science and Technology on Aircraft Control Laboratory, Beijing 100191, China
    4 School of Astronautics, Northwestern Polytechnical University, Xi’an 710072, China
  • Received:2021-02-24 Online:2022-10-27 Published:2022-10-27
  • Contact: Guofei LI E-mail:libh08@buaa.edu.cn;liguofei1@126.com
  • About author:|LI Bohao was born in 1990. He received his B.E. degree from Lanzhou University, Lanzhou, China, in 2012, and M.S. degree in Lanzhou University of Technology, Lanzhou, China, in 2017. He is currently pursuing his Ph.D. degree in navigation, guidance and control with Beihang University, Beijing, China. His research interests include deep learning, deep reinforcement learning, and guidance. E-mail: libh08@buaa.edu.cn||WU Yunjie was born in 1969. She received her Ph.D. degree in navigation guidance and control from Beihang University in 2006. Now, she is a professor in the School of Automation Science and Electrical Engineering, Beihang University, Beijing, China. Her research interests include system simulation, intelligent control, servo control, aircraft guidance and control technology. E-mail: wyjmip@ buaa.edu.cn||LI Guofei was born in 1991. He received his Ph.D. degree from the School of Automation Science and Electrical Engineering, Beihang University, Beijing, China, in 2020. From 2020 to 2021, he was a postdoctoral fellow of Zhuoyue Program in the School of Cyber Science and Technology, Beihang University, Beijing, China. Now, he is an associate professor in the School of Astronautics, Northwestern Polytechnical University, Xi’an, China. His research interests include cooperative guidance, servo system control, and nonlinear control. E-mail: liguofei1@126.com
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (62003021; 91212304)

Abstract:

The guidance strategy is an extremely critical factor in determining the striking effect of the missile operation. A novel guidance law is presented by exploiting the deep reinforcement learning (DRL) with the hierarchical deep deterministic policy gradient (DDPG) algorithm. The reward functions are constructed to minimize the line-of-sight (LOS) angle rate and avoid the threat caused by the opposed obstacles. To attenuate the chattering of the acceleration, a hierarchical reinforcement learning structure and an improved reward function with action penalty are put forward. The simulation results validate that the missile under the proposed method can hit the target successfully and keep away from the threatened areas effectively.

Key words: guidance law, deep reinforcement learning (DRL), threat avoidance, hierarchical reinforcement learning