Hybrid Q-learning for data-based optimal control of non-linear switching system

doi:10.23919/JSEE.2022.000114

Journal of Systems Engineering and Electronics ›› 2022, Vol. 33 ›› Issue (5): 1186-1194.doi: 10.23919/JSEE.2022.000114

• CONTROL THEORY AND APPLICATION • Previous Articles Next Articles

Hybrid Q-learning for data-based optimal control of non-linear switching system

Xiaofeng LI^1,²(), Lu DONG³(), Changyin SUN^1,^2,*()

¹ School of Automation, Southeast University, Nanjing 210096, China
² School of Artificial Intelligence, Anhui University, Hefei 230601, China
³ School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China

Received:2021-01-13 Accepted:2022-07-22 Online:2022-10-27 Published:2022-10-27
Contact: Changyin SUN E-mail:230169413@seu.edu.cn;ldong90@seu.edu.cn;cysun@seu.edu.cn
About author:|LI Xiaofeng was born in 1990. He received his B.S. degree and M.S. degree in engineering from Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2012 and 2016, respectively, and his Ph.D. degree in control science and engineering from Southeast University, Nanjing, China, in 2021. He is working as a postdoctoral researcher with the School of Artificial Intelligence, Anhui University, Heifei, China. He was a joint Ph.D. student with the Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI, USA, from 2018 to 2019. His current research interests include reinforcement learning, adaptive dynamic programming, robot system, and optimal control. E-mail: 230169413@seu.edu.cn||DONG Lu was born in 1990. She received her B.S. degree in physics and Ph.D. degree in electrical engineering from Southeast University, Nanjing, China in 2012 and 2017, respectively. She is currently an associate professor with the School of Cyber Science and Engineering, Southeast University, Nanjing, China. Her current research interests include adaptive dynamic programming, event-triggered control, nonlinear system control, and optimization. E-mail: ldong90@seu.edu.cn||SUN Changyin was born in 1975. He received his B.S. degree in applied mathematics from the College of Mathematics, Sichuan University, Chengdu, China, in 1996, and M.S. and Ph.D. degrees in electrical engineering from Southeast University, Nanjing, China, in 2001 and 2004, respectively. He is currently a professor with the School of Automation, Southeast University, Nanjing, China. His current research interests include intelligent control, flight control, and optimal theory. E-mail: cysun@seu.edu.cn
Supported by:
This work was supported by the National Key R&D Program of China (2018AAA0101400), the Natural Science Foundation of Jiangsu Province of China (BK20202006), and the National Natural Science Foundation of China (61921004;62173251).

Abstract

Abstract:

In this paper, the optimal control of non-linear switching system is investigated without knowing the system dynamics. First, the Hamilton-Jacobi-Bellman (HJB) equation is derived with the consideration of hybrid action space. Then, a novel data-based hybrid Q-learning (HQL) algorithm is proposed to find the optimal solution in an iterative manner. In addition, the theoretical analysis is provided to illustrate the convergence and optimality of the proposed algorithm. Finally, the algorithm is implemented with the actor-critic (AC) structure, and two linear-in-parameter neural networks are utilized to approximate the functions. Simulation results validate the effectiveness of the data-driven method.

Key words: switching system, hybrid action space, optimal control, reinforcement learning, hybrid Q-learning (HQL)

Xiaofeng LI, Lu DONG, Changyin SUN. Hybrid Q-learning for data-based optimal control of non-linear switching system[J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1186-1194.

Figures/Tables 10

Fig 1

Fig 2

Fig 3

Fig 4

Fig 5

Fig 6

Fig 7

Fig 8

Fig 9

Fig 10

References 38

1	LIBERZON D. Switching in systems and control. Boston: Birkhauser, 2003.
2	TANWANI A, SHIM H, LIBERZON D Observability for switched linear systems: characterization and observer design. IEEE Trans. on Automatic Control, 2013, 58 (4): 891- 904. doi: 10.1109/TAC.2012.2224257
3	RINEHART M, DAHLEH M, REED D, et al Suboptimal control of switched systems with an application to the disc engine. IEEE Trans. on Control Systems Technology, 2008, 16 (2): 189- 201. doi: 10.1109/TCST.2007.903366
4	KOUVELAS A, ABOUDOLAS K, PAPAGEORGIOU M, et al A hybrid strategy for real-time traffic signal control of urban road networks. IEEE Trans. on Intelligent Transportation Systems, 2011, 12 (3): 884- 894. doi: 10.1109/TITS.2011.2116156
5	BRYSON A E Optimal control−1950 to 1985. IEEE Control System Magazine, 1996, 16 (3): 26- 33. doi: 10.1109/37.506395
6	LIU D R, XUE S, ZHAO B, et al Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans. on System, Man, and Cybernetics: System, 2021, 51 (1): 142- 160. doi: 10.1109/TSMC.2020.3042876
7	SOLER M, OLIVARES A, STAFFETTI E, et al Framework for aircraft trajectory planning toward an efficient air traffic management. Journal of Aircraft, 2012, 49 (1): 341- 348. doi: 10.2514/1.C031490
8	GANS N R, HUTCHINSON S A Stable visual servoing through hybrid switched-system control. IEEE Trans. on Robotics, 2007, 23 (3): 530- 540. doi: 10.1109/TRO.2007.895067
9	LI X F, DONG L, XUE L, et al Hybrid reinforcement learning for optimal control of non-linear switching system. IEEE Trans. on Neural Networks and Learning Systems, 2022. doi: 10.1109/TNNLS.2022.3156287
10	SARGENT R Optimal control. Journal of Computational and Applied Mathematics, 2000, 124 (1): 361- 371.
11	AXELSSON H, EGERSTEDT M, WARDI Y, et al Algorithm for switching-time optimization in hybrid dynamical systems. Proc. of the IEEE International Conference on Control and Automation Intelligent Control, 2005, 256- 261. doi: 10.1109/.2005.1467024
12	EGERSTEDT M, WARDI Y, AXELSSON H Transition-time optimization for switched-mode dynamical systems. IEEE Trans. on Automatic Control, 2006, 51 (1): 110- 115. doi: 10.1109/TAC.2005.861711
13	LI S T, LIU X, TAN Y, et al Optimal switching time control of discrete-time switched autonomous systems. International Journal of Innovative Computing, Information and Control, 2015, 11 (6): 2043- 2050.
14	LUUS R, CHEN Y Optimal switching control via direct search optimization. Proc. of the IEEE International Symposium on Intelligent Control, 2003, 371- 376.
15	XU X P, ANTSAKLIS P J Optimal control of switched systems based on parameterization of the switching instants. IEEE Trans. on Automatic Control, 2004, 49 (1): 2- 16. doi: 10.1109/TAC.2003.821417
16	SAKLY M, SAKLY A, MAJDOUB N, et al Optimization of switching instants for optimal control of linear switched systems based on genetic algorithms. IFAC Proceedings Volumes, 2009, 42 (19): 249- 253. doi: 10.3182/20090921-3-TR-3005.00045
17	LONG R, FU J M, ZHANG L Y Optimal control of switched system based on neural network optimization. Proc. of the International Conference on Intelligent Computing, 2008, 799- 806.
18	RUNGGER M, STURSBERG O A numerical method for hybrid optimal control based on dynamic programming. Nonlinear Analysis: Hybrid Systems, 2011, 5 (2): 254- 274. doi: 10.1016/j.nahs.2010.09.002
19	SUTTON R S, BARTO A G. Reinforcement Learning: an introduction. Cambridge: MIT Press, 2018.
20	MNIH V, KAVUKCUOGLU K, SILVER D, et al Human-level control through deep reinforcement learning. Nature, 2015, 518 (7540): 529- 533. doi: 10.1038/nature14236
21	SILVER D, HUBERT T, SCHRITTWIESER J, et al A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 2018, 362 (6419): 1140- 1144. doi: 10.1126/science.aar6404
22	BERTSEKAS D P. Neuro-dynamic programming. Belmont: Athena Scientific, 1996.
23	LEWIS F L, VRABIE D Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 2009, 9 (3): 32- 50. doi: 10.1109/MCAS.2009.933854
24	SI J, WANG Y T Online learning control by association and reinforcement. IEEE Trans. on Neural networks, 2001, 12 (2): 264- 276. doi: 10.1109/72.914523
25	LI X F, DONG L, SUN C Y Data-based optimal tracking of autonomous nonlinear switching systems. IEEE/CAA Journal of Automatica Sinica, 2021, 8 (1): 227- 238. doi: 10.1109/JAS.2020.1003486
26	AL-TAMIMI A, LEWIS F L, ABU-KHALAF M Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans. on Systems, Man, and Cybernetics, Part B (Cybernetics), 2008, 38 (4): 943- 949. doi: 10.1109/TSMCB.2008.926614
27	MU C X, WANG D, HE H B Novel iterative neural dynamic programming for data-based approximate optimal control design. Automatica, 2017, 81, 240- 252. doi: 10.1016/j.automatica.2017.03.022
28	LUO B, WU H N, HUANG T W, et al Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica, 2014, 50 (12): 3281- 3290. doi: 10.1016/j.automatica.2014.10.056
29	ZHANG H G, SONG R Z, WEI Q L, et al Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Trans. on Neural Networks, 2011, 22 (12): 1851- 1862. doi: 10.1109/TNN.2011.2172628
30	ZHANG H G, LUO Y H, LIU D R Neural-network-based near optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans. on Neural Networks, 2009, 20 (9): 1490- 1503. doi: 10.1109/TNN.2009.2027233
31	DONG L, ZHONG X N, SUN C Y, et al Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems. IEEE Trans. on Neural Networks and Learning Systems, 2016, 28 (7): 1594- 1605.
32	HEYDARI A Optimal switching of DC-DC power converters using approximate dynamic programming. IEEE Trans. on Neural Networks and Learning Systems, 2016, 29 (3): 586- 596.
33	HEYDARI A Optimal switching with minimum dwell time constraint. Journal of the Franklin Institute, 2017, 354 (11): 4498- 4518. doi: 10.1016/j.jfranklin.2017.04.015
34	LIU D R, WANG D, ZHAO D B Neural-network based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans. on Automation Science and Engineering, 2012, 9 (3): 628- 634. doi: 10.1109/TASE.2012.2198057
35	ZHANG H G, QIN C, LUO Y H Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming. IEEE Trans. on Automation Science and Engineering, 2014, 11 (3): 839- 849. doi: 10.1109/TASE.2014.2303139
36	MU C X, LIAO K, REN L, et al Approximately optimal control of discrete-time nonlinear switched systems using globalized dual heuristic programming. Neural Processing Letters, 2020, 52 (2): 1089- 1108. doi: 10.1007/s11063-020-10278-9
37	GU S X, LILLICRAP T, SUTSKEVER I, et al Continuous deep Q-learning with model-based acceleration. Proc. of the International Conference on Machine Learning, 2016, 2829- 2838.
38	LEWIS F L, VRABIE D, SYRMOS V L. Optimal control. New Jersey: John Wiley & Sons, 2012.

[1]	Bohao LI, Yunjie WU, Guofei LI. Hierarchical reinforcement learning guidance with threat avoidance [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1173-1185.
[2]	Ang GAO, Qisheng GUO, Zhiming DONG, Zaijiang TANG, Ziwei ZHANG, Qiqi FENG. Research on virtual entity decision model for LVC tactical confrontation of army units [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1249-1267.
[3]	Jingyu CAO, Lu DONG, Changyin SUN. Day-ahead scheduling based on reinforcement learning with hybrid action space [J]. Journal of Systems Engineering and Electronics, 2022, 33(3): 693-705.
[4]	Xiangyang LIN, Qinghua XING, Fuxian LIU. Choice of discount rate in reinforcement learning with long-delay rewards [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 381-392.
[5]	Wenzhang LIU, Lu DONG, Jian LIU, Changyin SUN. Knowledge transfer in multi-agent reinforcement learning with incremental number of agents [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 447-460.
[6]	Wanping SONG, Zengqiang CHEN, Mingwei SUN, Qinglin SUN. Reinforcement learning based parameter optimization of active disturbance rejection control for autonomous underwater vehicle [J]. Journal of Systems Engineering and Electronics, 2022, 33(1): 170-179.
[7]	Jiandong ZHANG, Qiming YANG, Guoqing SHI, Yi LU, Yong WU. UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1421-1438.
[8]	Kaifang WAN, Bo LI, Xiaoguang GAO, Zijian HU, Zhipeng YANG. A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1490-1508.
[9]	Xin ZENG, Yanwei ZHU, Leping YANG, Chengming ZHANG. A guidance method for coplanar orbital interception based on reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(4): 927-938.
[10]	Ye MA, Tianqing CHANG, Wenhui FAN. A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(3): 642-657.
[11]	Zongxing LI, Rui ZHANG. Time-varying sliding mode control of missile based on suboptimal method [J]. Journal of Systems Engineering and Electronics, 2021, 32(3): 700-710.
[12]	Shengnan FU, Xiaodong LIU, Wenjie ZHANG, Qunli XIA. Multiconstraint adaptive three-dimensional guidance law using convex optimization [J]. Journal of Systems Engineering and Electronics, 2020, 31(4): 791-803.
[13]	Dariush TAVAKOLIFAR, Hamid KHALOOZADEH, Roya AMJADIFARD. Stabilization of switched systems with all unstable modes: application to the aircraft team problem [J]. Journal of Systems Engineering and Electronics, 2019, 30(4): 792-798.
[14]	Rong WANG, Yahui WU, Hongbin HUANG, Su DENG. Cooperative transmission in delay tolerant network [J]. Journal of Systems Engineering and Electronics, 2019, 30(1): 30-36.
[15]	Bin FU, Hang GUO, Kang CHEN, Wenxing FU, Xingyu WU, Jie YAN. Aero-thermal heating constrained midcourse guidance using state-constrained model predictive static programming method [J]. Journal of Systems Engineering and Electronics, 2018, 29(6): 1263-1270.

Hybrid Q-learning for data-based optimal control of non-linear switching system

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

Share this article

Figures/Tables 10

References 38

Related Articles 15

Recommended Articles

Metrics

Comments