UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience

doi:10.23919/JSEE.2024.000022

Journal of Systems Engineering and Electronics ›› 2024, Vol. 35 ›› Issue (3): 644-665.doi: 10.23919/JSEE.2024.000022

• SYSTEMS ENGINEERING • Previous Articles

UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience

Guang ZHAN¹(), Kun ZHANG¹^,²^,*(), Ke LI¹(), Haiyin PIAO¹()

¹ School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
² Science and Technology on Electro-Optic Control Laboratory, Luoyang 471009, China

Received:2022-06-14 Online:2024-06-18 Published:2024-06-19
Contact: Kun ZHANG E-mail:zhanguang@mail.nwpu.edu.cn;kunzhang@nwpu.edu.cn;keli_iat@mail.nwpu.edu.cn;haiyinpiao@mail.nwpu.edu.cn
About author:
ZHAN Guang was born in 1979. He received his B.S. degree from Beihang University, China, in 2001. He received his M.S. degree from Beihang University, China, in 2004. He is currently pursuing his Ph.D. degree in electronics and information technology with the School of Electronic Information at Northwestern Polytechnical University. His research interests are unmanned aerial vehicle and cluster intelligence. E-mail: zhanguang@mail.nwpu.edu.cn

ZHANG Kun was born in 1982. He received his Ph.D. degree from Northwestern Polytechnical University in 2010. He is an associate professor in Northwestern Polytechnical University. His research interests are intelligent air combat, intelligent control and decision-making, integrated avionics system simulation and testing, and advanced control theory and application. E-mail: kunzhang@nwpu.edu.cn

LI Ke was born in 1996. He received his B.S. degree from Northwestern Polytechnical University, Xi’an, China in 2018. Currently, he is pursuing his Ph.D. degree at Northwestern Polytechnical University. His research interests are autonomous flight of unmanned aerial vehicle, and unmanned aerial vehicle swarm control. E-mail: keli_iat@mail.nwpu.edu.cn

PIAO Haiyin was born in 1984. He received his M.S. degree in computer science from Dalian University of Technology, China, in 2010. He is currently working toward his Ph.D. degree in School of Electronics and Information, Northwestern Polytechnical University, China. His current research interests include deep learning, multi-agent reinforcement learning, and game theory with particular attention to aerospace applications. E-mail: haiyinpiao@mail.nwpu.edu.cn
Supported by:
This work was supported by the Key Research and Development Program of Shaanxi (2022GXLH-02-09), the Aeronautical Science Foundation of China (20200051053001), and the Natural Science Basic Research Program of Shaanxi (2020JM-147).

Abstract

Abstract:

Autonomous umanned aerial vehicle (UAV) manipulation is necessary for the defense department to execute tactical missions given by commanders in the future unmanned battlefield. A large amount of research has been devoted to improving the autonomous decision-making ability of UAV in an interactive environment, where finding the optimal maneuvering decision-making policy became one of the key issues for enabling the intelligence of UAV. In this paper, we propose a maneuvering decision-making algorithm for autonomous air-delivery based on deep reinforcement learning under the guidance of expert experience. Specifically, we refine the guidance towards area and guidance towards specific point tasks for the air-delivery process based on the traditional air-to-surface fire control methods. Moreover, we construct the UAV maneuvering decision-making model based on Markov decision processes (MDPs). Specifically, we present a reward shaping method for the guidance towards area and guidance towards specific point tasks using potential-based function and expert-guided advice. The proposed algorithm could accelerate the convergence of the maneuvering decision-making policy and increase the stability of the policy in terms of the output during the later stage of training process. The effectiveness of the proposed maneuvering decision-making policy is illustrated by the curves of training parameters and extensive experimental results for testing the trained policy.

Key words: unmanned aerial vehicle (UAV), maneuvering decision-making, autonomous air-delivery, deep reinforcement learning, reward shaping, expert experience

Guang ZHAN, Kun ZHANG, Ke LI, Haiyin PIAO. UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience[J]. Journal of Systems Engineering and Electronics, 2024, 35(3): 644-665.

Figures/Tables 43

Fig 1

Fig 2

Fig 3

Fig 4

Fig 5

Fig 6

Fig 7

Fig 8

Fig 9

Fig 10

Table 1

Table 2

Table 3

Table 4

Fig 11

Fig 12

Fig 13

Fig 14

Fig 15

Fig 16

Fig 17

Fig 18

Table 5

Fig 19

Fig 20

Fig 21

Fig 22

Table 6

Table 7

Table 8

Fig 23

Fig 24

Fig 25

Fig 26

Fig 27

Fig 28

Fig 29

Fig 30

Table 9

Fig 31

Fig 32

Fig 33

Fig 34

References 34

1	HALUDER H, BREZAK M, PETROVIC I, et al UAV-enabled intelligent transportation systems for the smart city: applications and challenges. IEEE Communications Magazine, 2017, 55 (3): 22- 28. doi: 10.1109/MCOM.2017.1600238CM
2	MATHISEN S G, LEIRA F S, HELGESEN H H, et al Autonomous ballistic airdrop of objects from a small fixed-wing unmanned aerial vehicle. Autonomous Robots, 2020, 44, 859- 875. doi: 10.1007/s10514-020-09902-3
3	LYU X, LI X B, DANG D L, et al Unmanned aerial vehicle (UAV) remote sensing in grassland ecosystem monitoring: a systematic review. Remote Sensing, 2022, 14 (5): 1096. doi: 10.3390/rs14051096
4	UKAEGBU U, TARTIBU L, OKWU M. Unmanned aerial vehicles for the future: classification, challenges, and opportunities. Proc. of the International Conference on Artificial Intelligence, Big Data, Computing and Data Communication Systems, 2021. DOI: 10.1109/icABXD51485.2021.9519367.
5	SEUNGHYEON L, YOUNGKEUN S, SUNG-HO K Feasibility analyses of real-time detection of wildlife using UAV-derived thermal and rgb images. Remote Sensing, 2021, 13 (11): 2169. doi: 10.3390/rs13112169
6	LI K, ZHANG K, ZHANG Z C, et al A UAV maneuver decision-making algorithm for autonomous airdrop based on deep reinforcement learning. Sensors, 2021, 21 (6): 2233. doi: 10.3390/s21062233
7	ZHANG K, LI K, HE J L, et al. A UAV autonomous maneuver decision-making algorithm for route guidance. Proc. of the International Conference on Unmanned Aircraft Systems, 2020: 17−25.
8	WAN K F, LI B, GAO X G, et al A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments. Journal of Systems Engineering and Electronics, 2021, 32 (6): 1490- 1508. doi: 10.23919/JSEE.2021.000126
9	ZHANG J D, YANG Q M, SHI G Q, et al UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. Journal of Systems Engineering and Electronics, 2021, 32 (6): 1421- 1438. doi: 10.23919/JSEE.2021.000121
10	YANG L, QI J T, XIAO J Z, et al. A literature review of UAV 3D path planning. Proc. of the 11th World Congress on Intelligent Control and Automation, 2014: 2376−2381.
11	HRVOJE K, MISEL B, IVAN P. A visibility graph based method for path planning in dynamic environments. Proc. of the 34th International Convention MIPRO, 2011: 717−721.
12	SUN Q P, LI M, WANG T H, et al. UAV path planning based on improved rapidly-exploring random tree. Proc. of the Chinese Control and Decision Conference, 2018: 6420−6424.
13	YAN F, LIN Y S, XIAO J Z Path planning in complex 3D environments using a probabilistic roadmap method. International Journal of Automation and Computing, 2013, 10 (6): 525- 533. doi: 10.1007/s11633-013-0750-9
14	FAN H T, TSUNG T L, CHO H L, et al. A star search algorithm for civil UAV path planning with 3G communication. Proc. of the International Conference on Intelligent Information Hiding and Multimedia Signal Processing, 2014: 942−945.
15	MENG B B. UAV path planning based on bidirectional sparse A* search algorithm. Proc. of the International Conference on Intelligent Computation Technology and Automation, 2010: 1106−1109.
16	DAVE F, ANTHONY S Using interpolation to improve path planning: the Field D* algorithm. Journal of Field Robotics, 2006, 23 (2): 79- 101. doi: 10.1002/rob.20109
17	JESIMAR D S, MARCIO D S, CLAUDIO F M T, et al Heuristic and genetic algorithm approaches for UAV path planning under critical situation. International Journal on Artificial Intelligence Tools, 2017, 26 (1): 1760008. doi: 10.1142/S0218213017600089
18	LEE H, KIM H J Trajectory tracking control of multirotors from modelling to experiments: a survey. International Journal of Control, Automation and Systems, 2017, 15, 281- 292. doi: 10.1007/s12555-015-0289-3
19	SEBASTIAN B, BEN-TZVI P Physics based path planning for autonomous tracked vehicle in challenging terrain. Journal of Intelligent & Robotic Systems, 2019, 95, 511- 526.
20	XU Z, ZHANG E Z, CHEN Q W Rotary unmanned aerial vehicles path planning in rough terrain based on multi-objective particle swarm optimization. Journal of Systems Engineering and Electronics, 2020, 31 (1): 130- 141.
21	OBERMEYER K J. Path planning for a UAV performing reconnaissance of static ground targets in terrain. Proc. of the AIAA Guidance, Navigation, and Control Conference, 2009: 5888.
22	FRANCOIS-LAVET V, HENDERSON P, ISLAM R, et al An introduction to deep reinforcement learning. Foundations and Trends in Machine Learning, 2018, 11 (3/4): 219- 354.
23	VOLODYMYR M, KORAY K, DAVID S, et al Human-level control through deep reinforcement learning. Nature, 2015, 518 (7540): 529- 533. doi: 10.1038/nature14236
24	TIMOTHY P L, JONATHAN J H, ALEXANDER P, et al. Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971.
25	TOM S, JOHN Q, IOANNIS A, et al. Prioritized experience replay. https://arxiv.org/abs/1511.05952.
26	PIAO H Y, SUN Z X, MENG G L, et al. Beyond-visual-range air combat tactics auto-generation by reinforcement learning. Proc. of the International Joint Conference on Neural Networks, 2020. DOI: 10.1109/IJCNN48605.2020.9207088.
27	SUN Z K, PIAO H Y, YANG Z, et al Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play. Engineering Applications of Artificial Intelligence, 2021, 98, 104112. doi: 10.1016/j.engappai.2020.104112
28	SHANKARACHARY R, EDWIN K P C UAV path planning in a dynamic environment via partially observable Markov decision process. IEEE Trans. on Aerospace and Electronic Systems, 2013, 49 (4): 2397- 2412. doi: 10.1109/TAES.2013.6621824
29	MARTIJN V O, MARCO W. Reinforcement learning. Cham: Springer, 2012: 3−42.
30	MASSENGILL H J. A technique for predicting aircraft flow-field effects upon an unguided bomb ballistic trajectory and comparison with flight test results. Proc. of the 31st Aerospace Sciences Meeting, 1993: 856.
31	MASSENGILL H J. A comparison of simulated and flight test ballistic trajectories for stores released from an aircraft in flight. Proc. of the 35th Aerospace Sciences Meeting and Exhibit, 1997: 926.
32	BABAK B, NASSER M, MOZAYANI N et al. A new potential-based reward shaping for reinforcement learning agent. Proc. of the IEEE 13th Annual Computing and Communication Workshop and Conference, 2023. DOI: 10.1109/CCWC57344.2023.10099211.
33	ANDREW Y N, DAISHI H, STUART R. Policy invariance under reward transformations: theory and application to reward shaping. Proc. of the International Conference on Machine Learning, 1999: 278−287.
34	ERIC W, GARRISON W C, CHARLES E. Principled methods for advising reinforcement learning agents. Proc. of the 20th International Conference on Machine Learning, 2003: 792−799.

Parameter	Range	Meaning
${D_{{\mathrm{LOS}}}}$	$ \left[ {D_{{\mathrm{LOS}}}^{\min },D_{{\mathrm{LOS}}}^{\max }} \right] $	Distance between UAV and target point
${\delta _{{\psi _{{\mathrm{LOS}}}}}}$	$\left[ {0,2\text{π} } \right)$	Relative azimuth between LOS and the nose direction of UAV
$ {v_{{\mathrm{UAV}}}} $	$\left[ {v_{{\mathrm{UAV}}}^{\min },v_{{\mathrm{UAV}}}^{\max }} \right]$	UAV speed
$ {H_{{\mathrm{UAV}}}} $	$\left[ {H_{{\mathrm{UAV}}}^{\min },H_{{\mathrm{UAV}}}^{\max }} \right]$	UAV height
$ {A_{{\mathrm{Bomb}}}} $	$\left[ {A_{{\mathrm{Bomb}}}^{\min },A_{{\mathrm{Bomb}}}^{\max }} \right]$	Horizontal range of bomb
$ {N_z} $	$\left[ { - N_z^{\max },N_z^{\max }} \right]$	Steering overload of UAV

Parameter	Value	Meaning
$K$	100	Policy’s learning period
$N$	100000	Historical buffer capacity
$\tau $	0.01	Soft updating parameter
$k$	128	Size of minibatch
$ {\eta _\mu } $	0.001	Actor networks’ learning rate
$ {\eta _Q} $	0.001	Critic networks’ learning rate
$\alpha $	0.5	Availability exponent of PER
${\beta _0}$	0.4	Initial IS exponent
$M$	1000	Maximum training episodes
$T$	5000	Maximum steps per episode

Layer name	Layer structure
Layer name	Unit	Activation function
Input layer of state	16	ReLU
Input layer of action	16	ReLU
Hidden layer 1	32	ReLU
Hidden layer 2	64	ReLU
Hidden layer 3	32	ReLU
Output layer	1	None

Layer name	Layer structure
Layer name	Unit	Activation function
Input layer	16	Tanh
Hidden layer 1	32	Tanh
Hidden layer 2	64	Tanh
Hidden layer 3	32	Tanh
Output layer	1	Tanh

Method	Number of experiments	Number of successful experiments	Successful rate
UER-DDPG and RS without advice	1000	998	0.998
PER-DDPG and RS without advice	1000	999	0.999
UER-DDPG and RS with advice	1000	999	0.999
PER-DDPG and RS	1000	999	0.999

UAV maneuvering decision-making algorithm based on deep reinforcement learning under the guidance of expert experience

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

Share this article

Figures/Tables 43

References 34

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	Jiandong ZHANG, Yukun GUO, Lihui ZHENG, Qiming YANG, Guoqing SHI, Yong WU. Real-time UAV path planning based on LSTM network [J]. Journal of Systems Engineering and Electronics, 2024, 35(2): 374-385.
[2]	Boyu QIN, Dong ZHANG, Shuo TANG, Yang XU. Two-layer formation-containment fault-tolerant control of fixed-wing UAV swarm for dynamic target tracking [J]. Journal of Systems Engineering and Electronics, 2023, 34(6): 1375-1396.
[3]	Jianhong WANG, RAMIREZ-MENDOZA Ricardo A., Yang XU. Nonlinear direct data-driven control for UAV formation flight system [J]. Journal of Systems Engineering and Electronics, 2023, 34(6): 1409-1418.
[4]	Zhiwen XIAO, Xiaowei FU. A cooperative detection game: UAV swarm vs. one fast intruder [J]. Journal of Systems Engineering and Electronics, 2023, 34(6): 1565-1575.
[5]	Yaozhong ZHANG, Zhuoran WU, Zhenkai XIONG, Long CHEN. A UAV collaborative defense scheme driven by DDPG algorithm [J]. Journal of Systems Engineering and Electronics, 2023, 34(5): 1211-1224.
[6]	Jiawei XIA, Xufang ZHU, Zhong LIU, Qingtao XIA. LSTM-DPPO based deep reinforcement learning controller for path following optimization of unmanned surface vehicle [J]. Journal of Systems Engineering and Electronics, 2023, 34(5): 1343-1358.
[7]	Jie LI, Xiaoyu DANG, Sai LI. DQN-based decentralized multi-agent JSAP resource allocation for UAV swarm communication [J]. Journal of Systems Engineering and Electronics, 2023, 34(2): 289-298.
[8]	Yaozhong ZHANG, Yike LI, Zhuoran WU, Jialin XU. Deep reinforcement learning for UAV swarm rendezvous behavior [J]. Journal of Systems Engineering and Electronics, 2023, 34(2): 360-373.
[9]	Xing LEI, Xiaoxuan HU, Guoqiang WANG, He LUO. A multi-UAV deployment method for border patrolling based on Stackelberg game [J]. Journal of Systems Engineering and Electronics, 2023, 34(1): 99-116.
[10]	Hao LI, Hemin SUN, Ronghua ZHOU, Huainian ZHANG. Hybrid TDOA/FDOA and track optimization of UAV swarm based on A-optimality [J]. Journal of Systems Engineering and Electronics, 2023, 34(1): 149-159.
[11]	Weikun HE, Jingbo SUN, Xinyun ZHANG, Zhenming LIU. Micro-Doppler feature extraction of micro-rotor UAV under the background of low SNR [J]. Journal of Systems Engineering and Electronics, 2022, 33(6): 1127-1139.
[12]	Yang XU, Weiming ZHENG, Delin LUO, Haibin DUAN. Dynamic affine formation control of networked under-actuated quad-rotor UAVs with three-dimensional patterns [J]. Journal of Systems Engineering and Electronics, 2022, 33(6): 1269-1285.
[13]	Honghong ZHANG, Xusheng GAN, Shuangfeng LI, Zhiyuan CHEN. UAV safe route planning based on PSO-BAS algorithm [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1151-1160.
[14]	Bohao LI, Yunjie WU, Guofei LI. Hierarchical reinforcement learning guidance with threat avoidance [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1173-1185.
[15]	Yangjun GAO, Guangyun LI, Zhiwei LYU, Lundong ZHANG, Zhongpan LI. Improved adaptively robust estimation algorithm for GNSS spoofer considering continuous observation error [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1237-1248.