Deep reinforcement learning for UAV swarm rendezvous behavior

doi:10.23919/JSEE.2023.000056

Journal of Systems Engineering and Electronics ›› 2023, Vol. 34 ›› Issue (2): 360-373.doi: 10.23919/JSEE.2023.000056

• SYSTEMS ENGINEERING • Previous Articles

Deep reinforcement learning for UAV swarm rendezvous behavior

Yaozhong ZHANG(), Yike LI(), Zhuoran WU, Jialin XU

¹ School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, China

Received:2021-01-08 Online:2023-04-18 Published:2023-04-18
Contact: Yaozhong ZHANG E-mail:zhang_y_z@nwpu.edu.cn;liyike@mail.nwpu.edu.cn
About author:
ZHANG Yaozhong was born in 1974. He received his B.S., M.S., and Ph.D. degrees in systems engineering from Northwestern Polytechnical University, Xi’an, China, in 1997, 2000, and 2006, respectively. He is an associate professor at the School of Electronics and Information in Northwestern Polytechnical University. His research interests include modeling, simulation and effectiveness evaluation of complex systems, reinforcement learning and its application in the complex system.E-mail: zhang_y_z@nwpu.edu.cn

LI Yike was born in 1997. He received his bachelor’s degree in welding technology and engineering from Xi’an Shiyou University in 2019. In 2022, he graduated from Northwestern Polytechnical University with a master’s degree in control engineering. His research interest is UAV path planning based on reinforcement learning.E-mail: liyike@mail.nwpu.edu.cn

WU Zhuoran was born in 1999. He graduated from Northwestern Polytechnical University with a bachelor ’s degree in detection guidance and control technology in 2021. He is currently pursuing his master ’s degree in control science and engineering at Northwestern Polytechnical University. His research interest is deep reinforcement learning for multi-agent.E-mail: 542391943@mail.nwpu.edu.cn

XU Jialin was born in 1995. He graduated from Northwestern Polytechnical University with a bachelor’s degree in detection guidance and control technology in 2018. In 2021, he graduated from Northwestern Polytechnical University with a master ’s degree in control engineering. His research interest is research on UAV swarm decision based on deep reinforcement learning.E-mail: xjl@mail.nwpu.edu.cn
Supported by:
This work was supported by the Aeronautical Science Foundation (2017ZC53033)

Abstract

Abstract:

The unmanned aerial vehicle (UAV) swarm technology is one of the research hotspots in recent years. With the continuous improvement of autonomous intelligence of UAV, the swarm technology of UAV will become one of the main trends of UAV development in the future. This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network (DDQN) algorithm. We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning (DRL) for the long period task. We also propose the concept of temporary storage area, optimizing the memory playback unit of the traditional DDQN algorithm, improving the convergence speed of the algorithm, and speeding up the training process of the algorithm. Different from traditional task environment, this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment. Based on the DDQN algorithm, the collaborative tasks of UAV swarm in different task scenarios are trained. The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy, and improving the intelligence of UAV swarm collaborative task execution. The simulation results show that after training, the proposed UAV swarm can carry out the rendezvous task well, and the success rate of the mission reaches 90%.

Key words: double deep Q network (DDQN) algorithms, unmanned aerial vehicle (UAV) swarm, task decision, deep reinforcement learning (DRL), sparse returns

Yaozhong ZHANG, Yike LI, Zhuoran WU, Jialin XU. Deep reinforcement learning for UAV swarm rendezvous behavior[J]. Journal of Systems Engineering and Electronics, 2023, 34(2): 360-373.

Figures/Tables 26

Fig 1

Fig 2

Fig 3

Fig 4

Table 1

Fig 5

Fig 6

Fig 7

Fig 8

Fig 9

Fig 10

Fig 11

Fig 12

Fig 13

Fig 14

Fig 15

Fig 16

Fig 17

Fig 18

Fig 19

Fig 20

Fig 21

Fig 22

Fig 23

Fig 24

Fig 25

References 11

12	SOUZA LEITE C F. A deep reinforcement learning algorithm for swarm robotics. Warsaw, Poland: Institute of Aeronautics and Applied Mechanics, 2018.
13	HUTTENRAUCH M, SOSIC A, NEUMANN G. Guided deep reinforcement learning for swarm systems. https://doi.org/10.48550/arXiv.1709.06011.
14	KERSANDT K. Deep reinforcement learning as control method for autonomous UAVs. Barcelona: Polytechnic University of Catalonia, 2018.
15	XUE X D, LI Z, ZHANG D S, et al A deep reinforcement learning method for mobile robot collision avoidance based on double DQN. Proc. of the IEEE 28th International Symposium on Industrial Electronics, 2019, 2131- 2136.
16	AN W, PARK C, HAN X, et al Hidden Markov model and auction-based formulations of sensor coordination mechanisms in dynamic task environments. IEEE Trans. on Systems, Man & Cybernetics: Part A, 2011, 41 (6): 1092- 1106.
17	TSITSIKLIS J N Asynchronous stochastic approximation and Q-learning. Machine Learning, 1994, 16 (3): 185- 202.
18	VINCENT F, RAPHAEL F, DAMIEN E. Playing Atari with deep reinforcement learning. https://doi.org/10.48550/arXiv.1312.5602.
19	HIKARU S, TADASHI H, SATORU K Experimental study on behavior acquisition of mobile robot by deep Q-network. Journal of Advanced Computational Intelligence and Intelligent Informatics, 2017, 21 (5): 840- 848. doi: 10.20965/jaciii.2017.p0840
1	SKJERVOLD E, HOELSRETER O T Autonomous, cooperative UAV operations using COTS consumer drones and custom ground control station. Proc. of the IEEE Military Communications Conference, 2019, 486- 492.
2	BARTON S L, WAYTOWICH N R, ZAROUKIAN E, et al Measuring collaborative emergent behavior in multi-agent reinforcement learning. Advances in Intelligent Systems and Computing, 2019, 876, 422- 427.
3	PHAM H, LA H, FEIL-SEIFER D, et al. Autonomous UAV navigation using reinforcement learning. https://doi.org/10.48550/arXiv.1801.05086.
4	PHAM H, FEIL-SEIFER D, FEIL-SEIFER D, et al. Cooperative and distributed reinforcement learning of drones for field coverage. https://doi.org/10.48550/arXiv.1803.07250.
5	PRICE J K, PINON-FISCHER O J, MAVRIS D N. Definition of optimal agent behaviors using reinforcement learning. Proc. of the AIAA SciTech Forum, 2019. DOI: 10.2514/6.2019-2200.
6	QI S Y, ZHU S C Intent-aware multi-agent reinforcement learning. Proc. of the IEEE International Conference on Robotics and Automation, 2018, 7533- 7540.
7	ZHANG W X, MA L, LI X N Multi-agent reinforcement learning based on local communication. Cluster Computing, 2019, 22 (6): 1- 10.
8	LIU Y X, HU L, TIAN Y L, et al Reinforcement learning based two-level control framework of UAV swarm for cooperative persistent surveillance in an unknown urban area. Aerospace Science and Technology, 2019, 98, 105671.
9	LUO D, YANG X U, ZHANG J New progresses on UAV swarm confrontation. Science & Technology Review, 2017, 35 (7): 26- 31.
10	OZSOYELLER D, TOKEKAR P Multi-robot symmetric rendezvous search on the line. IEEE Robotics and Automation Letters, 2021, 7 (1): 334- 341.
11	LI Q Y, DU X T, HUANG Y Z, et al. Learning of coordination policies for robotic swarms. https://doi.org/10.48550/arXiv.1709.06620.

ID	Control variable	Value size
1	${a}_{//}$	+2
2	${a}_{//}$	+1
3	${a}_{//}$	0
4	${a}_{//}$	−1
5	${a}_{//}$	−2
6	$ {a}_{\perp } $	+2
7	$ {a}_{\perp } $	+1
8	$ {a}_{\perp } $	0
9	$ {a}_{\perp } $	−1
10	$ {a}_{\perp } $	−2

[1]	Jie LI, Xiaoyu DANG, Sai LI. DQN-based decentralized multi-agent JSAP resource allocation for UAV swarm communication [J]. Journal of Systems Engineering and Electronics, 2023, 34(2): 289-298.
[2]	Hao LI, Hemin SUN, Ronghua ZHOU, Huainian ZHANG. Hybrid TDOA/FDOA and track optimization of UAV swarm based on A-optimality [J]. Journal of Systems Engineering and Electronics, 2023, 34(1): 149-159.
[3]	Bohao LI, Yunjie WU, Guofei LI. Hierarchical reinforcement learning guidance with threat avoidance [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1173-1185.
[4]	Yangyang JIANG, Yan GAO, Wenqi SONG, Yue LI, Quan QUAN. Bibliometric analysis of UAV swarms [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 406-425.
[5]	Jinqiang HU, Husheng WU, Renjun ZHAN, Rafik MENASSEL, Xuanwu ZHOU. Self-organized search-attack mission planning for UAV swarm based on wolf pack hunting behavior [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1463-1476.
[6]	Kaifang WAN, Bo LI, Xiaoguang GAO, Zijian HU, Zhipeng YANG. A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1490-1508.

Deep reinforcement learning for UAV swarm rendezvous behavior

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

Share this article

Figures/Tables 26

References 11

Related Articles 6

Recommended Articles

Metrics

Comments