Journal of Systems Engineering and Electronics ›› 2023, Vol. 34 ›› Issue (5): 1343-1358.doi: 10.23919/JSEE.2023.000113

• Control Theory and Application • Previous Articles     Next Articles

LSTM-DPPO based deep reinforcement learning controller for path following optimization of unmanned surface vehicle

Jiawei XIA1,2(), Xufang ZHU3,*(), Zhong LIU1(), Qingtao XIA1()   

  1. 1 School of Weaponry Engineering, Naval University of Engineering, Wuhan 430033, China
    2 Qingdao Campus, Naval Aviation University, Qingdao 266041, China
    3 School of Electronic Engineering, Naval University of Engineering, Wuhan 430033, China
  • Received:2021-08-18 Online:2023-10-18 Published:2023-10-30
  • Contact: Xufang ZHU E-mail:491650471@qq.com;1580284687@qq.com;liuzh531@163.com;xiaqing777@163.com
  • About author:
    XIA Jiawei was born in 1994. He received his B.S. and M.S. degrees from Naval University of Engineering (NUE), Wuhan, China, in 2016 and 2019, respectively. He is pursuing his Ph.D. degree in system engineering at NUE, Wuhan, China. His research interests include intelligent control of unmanned surface vehicles, deep reinforcement learning, and multi-agent control. E-mail: 491650471@qq.com

    ZHU Xufang was born in 1978. She received her B.S., M.S., and Ph.D. degrees from Naval University of Engineering (NUE), Wuhan, China, in 1999, 2007, and 2017 respectively. She is a lecturer in the School of Electronic Engineering, NUE. Her research interests include target characteristics, information perception technology, and circuits and systems. E-mail: 1580284687@qq.com

    LIU Zhong was born in 1963. He received his M.S. and Ph.D degrees from Naval University of Engineering (NUE) and Huazhong University of Science and Technology, Wuhan, China, in 1984 and 2004 respectively. He is currently a full professor at Shipborne Commseand & Control Department, NUE. His research interests include nonlinear control of mechanical systems with applcations to robotics and unmanned surface vehicles. E-mail: liuzh531@163.com

    XIA Qingtao was born in 1978. He received his B.S. and M.S. degrees from Naval University of Engineering (NUE), Wuhan, China, in 2001 and 2012, respectively. He is a lecturer in the School of Weaponry Engineering, NUE. His research interests include unmanned system war and command and control system. E-mail: xiaqing777@163.com
  • Supported by:
    This work was supported by the National Natural Science Foundation (61601491), the Natural Science Foundation of Hubei Province (2018CFC865), and the China Postdoctoral Science Foundation Funded Project (2016T45686)

Abstract:

To solve the path following control problem for unmanned surface vehicles (USVs), a control method based on deep reinforcement learning (DRL) with long short-term memory (LSTM) networks is proposed. A distributed proximal policy optimization (DPPO) algorithm, which is a modified actor-critic-based type of reinforcement learning algorithm, is adapted to improve the controller performance in repeated trials. The LSTM network structure is introduced to solve the strong temporal correlation USV control problem. In addition, a specially designed path dataset, including straight and curved paths, is established to simulate various sailing scenarios so that the reinforcement learning controller can obtain as much handling experience as possible. Extensive numerical simulation results demonstrate that the proposed method has better control performance under missions involving complex maneuvers than trained with limited scenarios and can potentially be applied in practice.

Key words: unmanned surface vehicle (USV), deep reinforcement learning (DRL), path following, path dataset, proximal policy optimization, long short-term memory (LSTM)