A UAV collaborative defense scheme driven by DDPG algorithm

doi:10.23919/JSEE.2023.000128

Journal of Systems Engineering and Electronics ›› 2023, Vol. 34 ›› Issue (5): 1211-1224.doi: 10.23919/JSEE.2023.000128

• Systems Engineering • Previous Articles Next Articles

A UAV collaborative defense scheme driven by DDPG algorithm

Yaozhong ZHANG¹^,*(), Zhuoran WU¹(), Zhenkai XIONG²(), Long CHEN³()

¹ School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
² College of New Energy and Intelligent Connected Vehicle, Anhui University of Science & Technology, Hefei 231131, China
³ China Research and Development Academy of Machinery Equipment, Beijing 100089, China

Received:2021-11-12 Online:2023-10-18 Published:2023-10-30
Contact: Yaozhong ZHANG E-mail:zhang_y_z@nwpu.edu.cn;542391943@qq.com;1223959392@qq.com;dragon-cl@sohu.com
About author:
ZHANG Yaozhong was born in1974. He is an associate professor at the School of Electronics and Information in Northwestern Polytechnical University. His research interests include modeling, simulation and effectiveness evaluation of complex systems, reinforcement learning and its application in the complex system. E-mail: zhang_y_z@nwpu.edu.cn

WU Zhuoran was born in 1999. He received his bachelor’s degree in detection guidance and control technology from the School of Electronics and Information, Northwestern Polytechnical University, and master’s degree in control science and engineering in the School of Electronics and Information, Northwestern Polytechnical University. His research interest is reinforcement learning. E-mail: 542391943@qq.com

XIONG Zhenkai was born in 1979. He received his Ph.D. degree from the College of Mechanical and Electrical Engineering, Harbin Engineering University in 2012. His main research interests include intelligent control, high-precision control, embedded system, optimal estimation theory and application, and filter algorithm. E-mail: 1223959392@qq.com

CHEN Long was born in 1967. He received his bachelor ’s degree from Changchun Institute of Optics and Precision Mechanics, and master ’s degree from Beijing Institute of Technology. He is a researcher at the China Research and Development Academy of Machinery Equipment, specializing in intelligent systems. He has been extensively involved in equipment system research and has served as the chief designer for the development of multiple types of equipment. He has received numerous provincial and ministerial-level scientific and technological advancement awards. E-mail: dragon-cl@sohu.com
Supported by:
This work was supported by the Key Research and Development Program of Shaanxi (2022GY-089) and the Natural Science Basic Research Program of Shaanxi (2022JQ-593)

Abstract

Abstract:

The deep deterministic policy gradient (DDPG) algorithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration. Using the DDPG algorithm, agents can explore and summarize the environment to achieve autonomous decisions in the continuous state space and action space. In this paper, a cooperative defense with DDPG via swarms of unmanned aerial vehicle (UAV) is developed and validated, which has shown promising practical value in the effect of defending. We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process. The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently, meeting the requirements of a UAV swarm for non-centralization, autonomy, and promoting the intelligent development of UAVs swarm as well as the decision-making process.

Key words: deep deterministic policy gradient (DDPG) algorithm, unmanned aerial vehicles (UAVs) swarm, task decision making, deep reinforcement learning, sparse reward problem

Yaozhong ZHANG, Zhuoran WU, Zhenkai XIONG, Long CHEN. A UAV collaborative defense scheme driven by DDPG algorithm[J]. Journal of Systems Engineering and Electronics, 2023, 34(5): 1211-1224.

Figures/Tables 23

Fig 1

Fig 2

Fig 3

Fig 4

Fig 5

Fig 6

Fig 7

Fig 8

Fig 9

Fig 10

Fig 11

Fig 12

Fig 13

Fig 14

Fig 15

Fig 16

Fig 17

Fig 18

Fig 19

Fig 20

Fig 21

Fig 22

Fig 23

References 27

1	ZHANG Q X, JIANG M L, FENG Z Y IoT enabled UAV: network architecture and routing algorithm. IEEE Internet of Things Journal, 2019, 6 (2): 3727- 3742. doi: 10.1109/JIOT.2018.2890428
2	LAN T, QIN D, SUN G Joint optimization on trajectory, cache placement, and transmission power for minimum mission time in UAV-aided wireless networks. International Journal of Geo-Information, 2021, 10 (7): 426. doi: 10.3390/ijgi10070426
3	ARAFAT M Y, MOH S Localization and clustering based on swarm intelligence in UAV networks for emergency communications. IEEE Internet of Things Journal, 2019, 6 (5): 8958- 8976. doi: 10.1109/JIOT.2019.2925567
4	MUKHERJEE A, MISRA S, CHANDRA V S P, et al Resource-optimized multiarmed bandit-based offload path selection in edge UAV swarms. IEEE Internet of Things Journal, 2019, 6 (3): 4889- 4896. doi: 10.1109/JIOT.2018.2879459
5	CHANDARANA M, MESZAROS E L, TRUJILLO A, et al Natural language based multimodal interface for UAV mission planning. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2017, 61 (1): 68- 72. doi: 10.1177/1541931213601483
6	CHANDARANA M, MESZAROS E L, TRUJILLO A, et al. “Fly like this”: natural language interface for UAV mission planning. Proc. of the International Conference on Advances in Computer-Human Interactions, 2017: 1−7.
7	XU G Q, JIANG W L, WANG Z L, et al Autonomous obstacle avoidance and target tracking of UAV based on deep reinforcement learning. Journal of Intelligent & Robotic Systems, 2022, 104 (4): 1- 13.
8	HU Y, CHEN M Z, SAAD W, et al Distributed multi-agent meta learning for trajectory design in wireless drone networks. IEEE Journal on Selected Areas in Communications, 2021, 39 (10): 3177- 3192. doi: 10.1109/JSAC.2021.3088689
9	WANG H, LI Y M, QIAN J B Self-adaptive resource allocation in underwater acoustic interference channel: a reinforcement learning approach. IEEE Internet of Things Journal, 2020, 7 (4): 2816- 2827. doi: 10.1109/JIOT.2019.2962915
10	LI G L, MA Y F Feature extraction algorithm of air combat situation based on deep neural networks. Journal of System Simulation, 2017, 29 (S1): 98- 105, 112.
11	HANG W. Research of UCAV air combat based on reinforcement learning. Harbin: Harbin Institute of Technology, 2015. (in Chinese)
12	XU Y H, XIE J W, ZHANG Y G, et al Reinforcement learning (RL)-based energy efficient resource allocation for energy harvesting-powered wireless body area network. Sensors, 2020, 20 (44): 1- 22.
13	LI Q Y, YAO H P, MAI T L, et al. Reinforcement and belief learning-based double auction mechanism for edge computing resource allocation. IEEE Internet of Things Journal. 2020, 7(7): 5976−5985.
14	WANG S Y, DUAN J J, SHI D, et al A data-driven multi-agent autonomous voltage control framework using deep reinforcement learning. IEEE Trans. on Power Systems, 2020, 35 (6): 4644- 4654. doi: 10.1109/TPWRS.2020.2990179
15	LIU P, MA Y F. A deep reinforcement learning based intelligent decision method for UCAV air combat. Proc. of the Asian Simulation Conference, 2017: 274−286.
16	PRICE J K, PINON-FISCHER O J, MAVRIS D N. Definition of optimal agent behaviors using reinforcement learning. Proc. of the AIAA Scitech Forum, 2019. DOI: 10.2514/6.2019-2200.
17	LUO D L, YANG X, ZHANG J P, et al New progress on UAV swarm confrontation. Science & Technology Review, 2017, 35 (7): 26- 31.
18	HU J W, WANG L H, HU T, et al Autonomous maneuver decision making of dual-UAV cooperative air combat based on deep reinforcement learning. Electronics, 2022, 11 (3): 467. doi: 10.3390/electronics11030467
19	MNIH V, KAVUKCUOGLU K, SILVER D, et al Human level control through deep reinforcement learning. Nature, 2015, 518 (7540): 529- 533. doi: 10.1038/nature14236
20	SILVER D, LEVER G, HEESS N. Deterministic policy gradient algorithms. Proc. of the 31st International Conference on International Conference on Machine Learning, 2014: 387−395.
21	SHI H B, SUN Y R, LI J. Dynamical motor control learned with deep deterministic policy gradient. Computational Intelligence and Neuroscience, 2018, 2018: 8535429
22	ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al A brief survey of deep reinforcement learning. IEEE Signal Processing Magazine, 2017, 34 (6): 26- 38. doi: 10.1109/MSP.2017.2743240
23	WANG G H, SHI J L. Actor-critic for multi-agent system with variable quantity of agents. Proc. of the International Conference on Internet of Things as a Service, 2018: 48−56.
24	HUANG W R, WANG Y Z, YI X D. A deep reinforcement learning approach to preserve connectivity for multi-robot systems. Proc. of the 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, 2017: 1−7.
25	YI H. Deep deterministic policy gradient for autonomous vehicle driving. Proc. of the International Conference on Artificial Intelligence, 2018: 191−194.
26	ANDERSEN P A, GOODWIN M, GRANMO O C. Deep RTS: a game environment for deep reinforcement learning in real-time strategy games. Proc. of the IEEE Conference on Computational Intelligence and Games, 2018: 1−8.
27	NIE H H, CHEN Y, SONG Y K, et al A general real-time OPF algorithm using DDPG with multiple simulation platforms. Proc. of the IEEE Innovative Smart Grid Technologies, 2019, 3713- 3718.

[1]	Jiawei XIA, Xufang ZHU, Zhong LIU, Qingtao XIA. LSTM-DPPO based deep reinforcement learning controller for path following optimization of unmanned surface vehicle [J]. Journal of Systems Engineering and Electronics, 2023, 34(5): 1343-1358.
[2]	Yaozhong ZHANG, Yike LI, Zhuoran WU, Jialin XU. Deep reinforcement learning for UAV swarm rendezvous behavior [J]. Journal of Systems Engineering and Electronics, 2023, 34(2): 360-373.
[3]	Bohao LI, Yunjie WU, Guofei LI. Hierarchical reinforcement learning guidance with threat avoidance [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1173-1185.
[4]	Ang GAO, Qisheng GUO, Zhiming DONG, Zaijiang TANG, Ziwei ZHANG, Qiqi FENG. Research on virtual entity decision model for LVC tactical confrontation of army units [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1249-1267.
[5]	Kaifang WAN, Bo LI, Xiaoguang GAO, Zijian HU, Zhipeng YANG. A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1490-1508.

A UAV collaborative defense scheme driven by DDPG algorithm

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

Share this article

Figures/Tables 23

References 27

Related Articles 5

Recommended Articles

Metrics

Comments