Journal of Systems Engineering and Electronics ›› 2024, Vol. 35 ›› Issue (3): 644-665.doi: 10.23919/JSEE.2024.000022

• SYSTEMS ENGINEERING • Previous Articles    

UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience

Guang ZHAN1(), Kun ZHANG1,2,*(), Ke LI1(), Haiyin PIAO1()   

  1. 1 School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
    2 Science and Technology on Electro-Optic Control Laboratory, Luoyang 471009, China
  • Received:2022-06-14 Online:2024-06-18 Published:2024-06-19
  • Contact: Kun ZHANG E-mail:zhanguang@mail.nwpu.edu.cn;kunzhang@nwpu.edu.cn;keli_iat@mail.nwpu.edu.cn;haiyinpiao@mail.nwpu.edu.cn
  • About author:
    ZHAN Guang was born in 1979. He received his B.S. degree from Beihang University, China, in 2001. He received his M.S. degree from Beihang University, China, in 2004. He is currently pursuing his Ph.D. degree in electronics and information technology with the School of Electronic Information at Northwestern Polytechnical University. His research interests are unmanned aerial vehicle and cluster intelligence. E-mail: zhanguang@mail.nwpu.edu.cn

    ZHANG Kun was born in 1982. He received his Ph.D. degree from Northwestern Polytechnical University in 2010. He is an associate professor in Northwestern Polytechnical University. His research interests are intelligent air combat, intelligent control and decision-making, integrated avionics system simulation and testing, and advanced control theory and application. E-mail: kunzhang@nwpu.edu.cn

    LI Ke was born in 1996. He received his B.S. degree from Northwestern Polytechnical University, Xi’an, China in 2018. Currently, he is pursuing his Ph.D. degree at Northwestern Polytechnical University. His research interests are autonomous flight of unmanned aerial vehicle, and unmanned aerial vehicle swarm control. E-mail: keli_iat@mail.nwpu.edu.cn

    PIAO Haiyin was born in 1984. He received his M.S. degree in computer science from Dalian University of Technology, China, in 2010. He is currently working toward his Ph.D. degree in School of Electronics and Information, Northwestern Polytechnical University, China. His current research interests include deep learning, multi-agent reinforcement learning, and game theory with particular attention to aerospace applications. E-mail: haiyinpiao@mail.nwpu.edu.cn
  • Supported by:
    This work was supported by the Key Research and Development Program of Shaanxi (2022GXLH-02-09), the Aeronautical Science Foundation of China (20200051053001), and the Natural Science Basic Research Program of Shaanxi (2020JM-147).

Abstract:

Autonomous umanned aerial vehicle (UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decision-making policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods. Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes (MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.

Key words: unmanned aerial vehicle (UAV), maneuvering decision-making, autonomous air-delivery, deep reinforcement learning, reward shaping, expert experience