Reinforcement learning based parameter optimization of active disturbance rejection control for autonomous underwater vehicle

doi:10.23919/JSEE.2022.000017

Journal of Systems Engineering and Electronics ›› 2022, Vol. 33 ›› Issue (1): 170-179.doi: 10.23919/JSEE.2022.000017

• CONTROL THEORY AND APPLICATION • Previous Articles Next Articles

Reinforcement learning based parameter optimization of active disturbance rejection control for autonomous underwater vehicle

Wanping SONG¹(), Zengqiang CHEN^1,^2,*(), Mingwei SUN¹(), Qinglin SUN¹()

¹ College of Artificial Intelligence, Nankai University, Tianjin 300350, China
² Key Laboratory of Intelligent Robotics of Tianjin, Nankai University, Tianjin 300350, China

Received:2020-11-26 Accepted:2021-11-24 Online:2022-01-18 Published:2022-02-22
Contact: Zengqiang CHEN E-mail:1422501596@qq.com;chenzq@nankai.edu.cn;smw_sunmingwei@163.com;sunql@nankai.edu.cn
About author:|SONG Wanping was born in 1998. She received her B.S. degree from Nanjing Agricultural University, Nanjing, China, in 2019. She is currently a graduate student of Nankai University, Tianjin, China. Her current research interests include active disturbance rejection control and reinforcement learning. E-mail: 1422501596@qq.com||CHEN Zengqiang was born in 1964. He received his B.S., M.E., and Ph.D. degrees from Nankai University, in 1987, 1990, and 1997, respectively. He is currently a professor of control theory and engineering of Nankai University, and deputy director of Institute of Robotics and Information Automation. His current research interests include intelligent predictive control, chaotic systems and complex dynamic network, and multi-agent system control. E-mail: chenzq@nankai.edu.cn||SUN Mingwei was born in 1972. He received his Ph.D. degree from the Department of Computer and Systems Science, Nankai University, Tianjin, China, in 2000. From 2000 to 2008, he was a Flight Control Engineer with Beijing Electro-mechanical Engineering Research Institute, Beijing, China. Since 2009, he has been with Nankai University as a professor. His research interests include flight control, guidance, model predictive control, active disturbance rejection control, and nonlinear optimization. E-mail: smw_sunmingwei@163.com||SUN Qinglin received his B.E. and M.E. degrees in control theory and control engineering from Tianjin University, Tianjin, China, in 1985 and 1990, respectively, and his Ph.D. degree in control science and engineering from Nankai University, Tianjin, China, in 2003. He is currently a professor at the Intelligence Predictive Adaptive Control Laboratory of Nankai University and associate dean of College of Artificial Intelligence. His research interests include self-adaptive control, modeling and control of flexible spacecraft, and embedded control systems. E-mail: sunql@nankai.edu.cn
Supported by:
This work was supported by the National Natural Science Foundation of China (61973175; 61973172) and Tianjin Natural Science Foundation (19JCZDJC32800).

Abstract

Abstract:

This paper proposes a liner active disturbance rejection control (LADRC) method based on the Q-Learning algorithm of reinforcement learning (RL) to control the six-degree-of-freedom motion of an autonomous underwater vehicle (AUV). The number of controllers is increased to realize AUV motion decoupling. At the same time, in order to avoid the oversize of the algorithm, combined with the controlled content, a simplified Q-learning algorithm is constructed to realize the parameter adaptation of the LADRC controller. Finally, through the simulation experiment of the controller with fixed parameters and the controller based on the Q-learning algorithm, the rationality of the simplified algorithm, the effectiveness of parameter adaptation, and the unique advantages of the LADRC controller are verified.

Key words: autonomous underwater vehicle (AUV), reinforcement learning (RL), Q-learning, linear active disturbance rejection control (LADRC), motion decoupling, parameter optimization

Wanping SONG, Zengqiang CHEN, Mingwei SUN, Qinglin SUN. Reinforcement learning based parameter optimization of active disturbance rejection control for autonomous underwater vehicle[J]. Journal of Systems Engineering and Electronics, 2022, 33(1): 170-179.

Figures/Tables 13

Fig 1

Table 1

Fig 2

Fig 3

Fig 4

Fig 5

Table 2

Fig 6

Fig 7

Fig 8

Fig 9

References 30

1	ROBERT B Underwater robots: a review of technologies and applications. Industrial Robot: An International Journal, 2015, 42 (3): 186- 191. doi: 10.1108/IR-01-2015-0010
2	ZHANG F M, MARANI G, SMITH R N, et al Future trends in marine robotics [TC Spotlight]. IEEE Robotics & Automation Magazine, 2015, 22 (1): 14- 122.
3	RYOSUKE K, SATOSHI O Development of hovering control system for an underwater vehicle to perform core internal inspections. Journal of Nuclear Science and Technology, 2016, 53 (4): 566- 573. doi: 10.1080/00223131.2015.1064331
4	MANECIUS S J, ASOKAN T. Station keeping control of underwater robots using disturbance force measurements. Journal of Marine Science and Technology, 2016, 21(1): 70−85.
5	SATO Y, MAKI T, KUME A, et al Path replanning method for an AUV in natural hydrothermal vent fields: toward 3D imaging of a hydrothermal chimney. Marine Technology Society Journal, 2014, 48 (3): 104- 114. doi: 10.4031/MTSJ.48.3.5
6	RIBAS D, PALOMERAS N, RIDAO P, et al Girona 500 AUV: from survey to intervention. IEEE/ASME Trans. on Mechatronics, 2012, 17 (1): 46- 53. doi: 10.1109/TMECH.2011.2174065
7	ANTONELLI G. Underwater robots: motion and force control of vehicle-manipulator systems. Switzerland: Springer, 2010.
8	PRZEMYSŁAW H Decoupled PD set-point controller for underwater vehicles. Ocean Engineering, 2009, 36 (6): 529- 534.
9	TAHA E, MOHAMED Z, KAMAL Y T Control for dynamic positioning and way-point tracking of underactuated autonomous underwater vehicles using sliding mode control. Journal of Intelligent & Robotic Systems, 2019, 95 (3/4): 1113- 1132.
10	MOHAMMAD H K, SAEED B Modeling and control of autonomous underwater vehicle (AUV) in heading and depth attitude via self-adaptive fuzzy PID controller. Journal of Marine Science and Technology, 2015, 20 (3): 559- 578. doi: 10.1007/s00773-015-0312-7
11	XUE Q Adaptive coordinated tracking control of multiple autonomous underwater vehicles. Ocean Engineering, 2014, 91, 84- 90. doi: 10.1016/j.oceaneng.2014.08.019
12	HAN J Q. Auto-disturbance-rejection controller and its applications. Control and Decision, 1998, 13(1): 19–23. (in Chinese)
13	HAN J Q. From PID to active disturbance rejection control. IEEE Trans. on Industrial Electronics, 2009, 56(3): 900–906.
14	GAO Z Q. Scaling and bandwidth-parameterization based controller tuning. Proc. of the American Control Conference, 2006: 4989−4996.
15	LIU J J, SUN M W, CHEN Z Q, et al High AOA decoupling control for aircraft based on ADRC. Journal of Systems Engineering and Electronics, 2020, 31 (2): 393- 402. doi: 10.23919/JSEE.2020.000016
16	CHEN Z Q, QIN B B, SUN M W, et al Q-learning-based parameters adaptive algorithm for active disturbance rejection control and its application to ship course control. Neurocomputing, 2020, 408, 51- 63. doi: 10.1016/j.neucom.2019.10.060
17	ZHENG Y M, CHEN Z Q, HUANG Z Y, et al Active disturbance rejection controller for multi-area interconnected power system based on reinforcement learning. Neurocomputing, 2021, 425, 149- 159. doi: 10.1016/j.neucom.2020.03.070
18	LI J H, LEE P M Design of an adaptive nonlinear controller for depth control of an autonomous underwater vehicle. Ocean Engineering, 2005, 32 (17/18): 2165- 2181. doi: 10.1016/j.oceaneng.2005.02.012
19	LIU S Y, WANG D W, POH E Non-linear output feedback tracking control for AUVs in shallow wave disturbance condition. International Journal of Control, 2008, 81 (11): 1806- 1823. doi: 10.1080/00207170801898885
20	XIANG X B, YU C Y, ZHANG Q Robust fuzzy 3D path following for autonomous underwater vehicle subject to uncertainties. Computers and Operations Research, 2016, 84, 165- 177.
21	LIANG X, QU X R, WANG N, et al. Three-dimensional trajectory tracking of an underactuated AUV based on fuzzy dynamic surface control. IET Intelligent Transport Systems, 2020, 14(5): 364−370.
22	LI Y, QIU X H, LIU X D, et al Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs. Journal of Systems Engineering and Electronics, 2020, 31 (4): 734- 742. doi: 10.23919/JSEE.2020.000048
23	SHEN Y X, SHAO K Y, REN W J, et al Diving control of autonomous underwater vehicle based on improved active disturbance rejection control approach. Neurocomputing, 2016, 173 (3): 1377- 1385.
24	SUTTON R, BARTO A. Reinforcement learning: an introduction. Massachusetts: MIT Press, 1998.
25	LOW E S, ONG P, CHEAH K C Solving the optimal path planning of a mobile robot using improved Q-learning. Robotics and Autonomous Systems, 2019, 115, 143- 161. doi: 10.1016/j.robot.2019.02.013
26	XIANG X B, YU C Y, ZHANG Q, et al Path-following control of an AUV: fully actuated versus under-actuated configuration. Marine Technology Society Journal, 2016, 50 (1): 34- 47. doi: 10.4031/MTSJ.50.1.4
27	PRESTERO T. Verification of a six-degree of freedom simulation model for the REMUS autonomous underwater vehicle. Massachusetts Institute of Technology, 2001. DOI:10.1575/1912/3040.
28	YANG R, SUN M W, CHEN Z Q Active disturbance rejection control on first-order plant. Journal of Systems Engineering and Electronics, 2011, 22 (1): 95- 102. doi: 10.3969/j.issn.1004-4132.2011.01.012
29	TANG D, GAO Z Q, ZHANG X H Design of predictive active disturbance rejection controller for turbidity. Control Theory and Applications, 2017, 34 (1): 101- 108.
30	XUE W C, HUANG Y Performance analysis of active disturbance rejection tracking control for a class of uncertain LTI systems. ISA Transactions, 2015, 58, 133- 154. doi: 10.1016/j.isatra.2015.05.001

Motion	Position and angle	Linear and angular velocity (Force and moment)
Surge	$ x $	$ u $ ( $ X $ )
Sway	$ y $	$ v $ ( $ Y $ )
Heave	$ z $	$ w $ ( $ Z $ )
Roll	$ \varphi $	$ p $ ( $ K $ )
Pitch	$ \theta $	$ q $ ( $ M $ )
Yaw	$ \psi $	$ r $ ( $ N $ )

[1]	Xiaofeng LI, Lu DONG, Changyin SUN. Hybrid Q-learning for data-based optimal control of non-linear switching system [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1186-1194.
[2]	Xiangyang LIN, Qinghua XING, Fuxian LIU. Choice of discount rate in reinforcement learning with long-delay rewards [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 381-392.
[3]	Xin ZENG, Yanwei ZHU, Leping YANG, Chengming ZHANG. A guidance method for coplanar orbital interception based on reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(4): 927-938.
[4]	Ye MA, Tianqing CHANG, Wenhui FAN. A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(3): 642-657.
[5]	Yu HUANG, Lihua WU, Qiang YU. Underwater square-root cubature attitude estimator by use of quaternion-vector switching and geomagnetic field tensor [J]. Journal of Systems Engineering and Electronics, 2020, 31(4): 804-814.
[6]	Qian WANG, Chuanding ZHANG, Deyong XIAN. Multi-channel signal parameters joint optimization for GNSS terminals [J]. Journal of Systems Engineering and Electronics, 2018, 29(1): 39-47.
[7]	Hong Li, Mingyong Liu, and Kun Liu. Bio-inspired geomagnetic navigation method for autonomous underwater vehicle#br# [J]. Journal of Systems Engineering and Electronics, 2017, 28(6): 1203-1209.
[8]	Kangsheng Dong, Hanqiao Huang, Changqiang Huang, and Zhuoran Zhang. Trajectory online optimization for unmanned combat aerial vehicle using combined strategy [J]. Systems Engineering and Electronics, 2017, 28(5): 963-970.
[9]	Qingwei Liang, Tianyuan Sun, and Dongdong Wang. Time-varying reliability indexes for multi-AUV cooperative system [J]. Systems Engineering and Electronics, 2017, 28(2): 401-406.
[10]	Qingwei Liang, Tianyuan Sun, and Dongdong Wang. Reliability indexes for multi-AUV cooperative systems [J]. Systems Engineering and Electronics, 2017, 28(1): 179-.
[11]	Mingyong Liu, Baogui Xu, and Xingguang Peng. Cooperative path planning for multi-AUV in time-varying ocean flows [J]. Systems Engineering and Electronics, 2016, 27(3): 612-618.
[12]	Min Zhu, Chunling Yang, and Weiliang Li. Autotuning algorithm of particle swarm PID parameter based on D-Tent chaotic model [J]. Journal of Systems Engineering and Electronics, 2013, 24(5): 828-837.
[13]	Zheng Yongkang, Chen Weirong, Dai Chaohua & Wang Weibo. Stochastic focusing search: a novel optimization algorithm for real-parameter optimization [J]. Journal of Systems Engineering and Electronics, 2009, 20(4): 869-876.
[14]	Xia Qunli, Guo Tao & Qi Zaikang. Study of trajectory optimization using terminal-node adaptive-altered spline algorithm [J]. Journal of Systems Engineering and Electronics, 2009, 20(3): 551-557.

Reinforcement learning based parameter optimization of active disturbance rejection control for autonomous underwater vehicle

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

Share this article

Figures/Tables 13

References 30

Related Articles 14

Recommended Articles

Metrics

Comments

$ \theta {\text{/}}{\rm{rad}} $	$ e{\text{/}}{\rm{m}} $
$ \theta {\text{/}}{\rm{rad}} $	(?11,0.3]	(?0.3,0.1]	(?0.1,0.1)	[0.1,0.3)	[0.3,11)
(?1,?0.3]	1	6	11	16	21
(?0.3,?0.1]	2	7	12	17	22
(?0.1,0.1)	3	8	13	18	23
[0.1,0.3)	4	9	14	19	24
[0.3,1)	5	10	15	20	25