Reinforcement learning-based scheduling of multi-battery energy storage system

doi:10.23919/JSEE.2023.000036

Journal of Systems Engineering and Electronics ›› 2023, Vol. 34 ›› Issue (1): 117-128.doi: 10.23919/JSEE.2023.000036

• SYSTEMS ENGINEERING • Previous Articles Next Articles

Reinforcement learning-based scheduling of multi-battery energy storage system

Guangran CHENG¹^,²(), Lu DONG³(), Xin YUAN¹(), Changyin SUN¹^,²^,*()

¹ School of Automation, Southeast University, Nanjing 210096, China
² Peng Cheng Laboratory, Shenzhen 518066, China
³ School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China

Received:2021-12-29 Online:2023-02-18 Published:2023-03-03
Contact: Changyin SUN E-mail:chenggr@seu.edu.cn;ldong90@seu.edu.cn;xinyuan@seu.edu.cn;cysun@seu.edu.cn
About author:
CHENG Guangran was born in 1996. She received her B.S. degree in automation from Nanjing University of Science and Technology, Nanjing, China, in 2018. She is currently working toward her Ph.D. degree in control science and engineering at the School of Automation, Southeast University, Nanjing, China. Her current research interests include reinforcement learning, multi-objective learning, and robot navigation. E-mail: chenggr@seu.edu.cn

DONG Lu was born in 1990. She received her B.S. degree in the School of Physics and Ph.D. degree in the School of Automation from Southeast University, Nanjing, China, in 2012 and 2017, respectively. She is currently an associate professor with the School of Cyber Science and Engineering, Southeast University. Her current research interests include adaptive dynamic programming, event-triggered control, nonlinear system control, and optimization. E-mail: ldong90@seu.edu.cn

YUAN Xin was born in 1989. He received his B.S. degree in electrical engineering and M.S. degree in vehicle operation engineering from Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2012 and 2015, respectively, and Ph.D. degree in control science and engineering from Southeast University, Nanjing, China, in 2021. Currently, he is working as a postdoctoral researcher with the School of Automation, Southeast University, Nanjing, China. He was a joint Ph.D. student with the Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI, USA, from 2017 to 2018. His current research interests include reinforcement learning, unmanned aerial vehicle, and optimal control. E-mail: xinyuan@seu.edu.cn

SUN Changyin was born in 1975. He received his B.S. degree in applied mathematics from the College of Mathematics, Sichuan University, Chengdu, China, in 1996, and M.S. and Ph.D. degrees in electrical engineering from Southeast University, Nanjing, China, in 2001 and 2004, respectively. He is currently a professor with the School of Automation, Southeast University, Nanjing, China. His current research interests include intelligent control, flight control, and optimal theory. E-mail: cysun@seu.edu.cn
Supported by:
This work was supported by the National Key R&D Program of China (2018AAA0101400), the National Natural Science Foundation of China (61921004;62173251;U1713209;62236002), the Fundamental Research Funds for the Central Universities, and Guangdong Provincial Key Laboratory of Intelligent Decision and Cooperative Control.

Abstract

Abstract:

In this paper, a reinforcement learning-based multi-battery energy storage system (MBESS) scheduling policy is proposed to minimize the consumers ’ electricity cost. The MBESS scheduling problem is modeled as a Markov decision process (MDP) with unknown transition probability. However, the optimal value function is time-dependent and difficult to obtain because of the periodicity of the electricity price and residential load. Therefore, a series of time-independent action-value functions are proposed to describe every period of a day. To approximate every action-value function, a corresponding critic network is established, which is cascaded with other critic networks according to the time sequence. Then, the continuous management strategy is obtained from the related action network. Moreover, a two-stage learning protocol including offline and online learning stages is provided for detailed implementation in real-time battery management. Numerical experimental examples are given to demonstrate the effectiveness of the developed algorithm.

Key words: multi-battery energy storage system (MBESS), reinforcement learning, periodic value iteration, data-driven

Guangran CHENG, Lu DONG, Xin YUAN, Changyin SUN. Reinforcement learning-based scheduling of multi-battery energy storage system[J]. Journal of Systems Engineering and Electronics, 2023, 34(1): 117-128.

Figures/Tables 12

Fig 1

Fig 2

Table 1

Table 2

Fig 3

Fig 4

Fig 5

Fig 6

Fig 7

Table 3

Fig 8

Fig 9

References 38

1	FUSELLI D, DE ANGELIS F, BOARO M, et al Action dependent heuristic dynamic programming for home energy resource scheduling. International Journal of Electrical Power & Energy Systems, 2013, 48, 148- 160.
2	ERSEGHE T, ZANELLA A, CODEMO C G Optimal and compact control policies for energy storage units with single and multiple batteries. IEEE Trans. on Smart Grid, 2014, 5 (3): 1308- 1317. doi: 10.1109/TSG.2014.2303824
3	ALBADI M H, EL-SAADANY E F A summary of demand response in electricity markets. Electric Power Systems Research, 2008, 78 (11): 1989- 1996. doi: 10.1016/j.jpgr.2008.04.002
4	SETLHAOLO D, XIA X H Optimal scheduling of household appliances with a battery storage system and coordination. Energy and Buildings, 2015, 94, 61- 70. doi: 10.1016/j.enbuild.2015.02.051
5	LIU C Y, WANG X L, WU X, et al Economic scheduling model of microgrid considering the lifetime of batteries. IET Generation, Transmission & Distribution, 2017, 11 (3): 759- 767.
6	LUNA A C, DIAZ N L, GRAELLS M, et al. Mixed-integer-linear-programming-based energy management system for hybrid PV-wind-battery microgrids: modeling, design, and experimental verification. IEEE Trans. on Power Electronics, 2016, 32(4): 2769−2783.
7	GAN L K, ZHANG P, LEE J, et al Data-driven energy management system with Gaussian process forecasting and MPC for interconnected microgrids. IEEE Trans. on Sustainable Energy, 2020, 12 (1): 695- 704.
8	ARASTEH F, RIAHY G H MPC-based approach for online demand side and storage system management in market based wind integrated power systems. International Journal of Electrical Power & Energy Systems, 2019, 106, 124- 137.
9	ZHANG Y, WANG R, ZHANG T, et al Model predictive control-based operation management for a residential microgrid with considering forecast uncertainties and demand response strategies. IET Generation, Transmission & Distribution, 2016, 10 (10): 2367- 2378.
10	HABIB M, LADJICI A A, BOLLIN E, et al One-day ahead predictive management of building hybrid power system improving energy cost and batteries lifetime. IET Renewable Power Generation, 2019, 13 (3): 482- 490. doi: 10.1049/iet-rpg.2018.5454
11	HU K Y, LI W J, WANG L D, et al Energy management for multi-microgrid system based on model predictive control. Frontiers of Information Technology & Electronic Engineering, 2018, 19 (11): 1340- 1351.
12	LU R Z, HONG S H, YU M M Demand response for home energy management using reinforcement learning and artificial neural network. IEEE Trans. on Smart Grid, 2019, 10 (6): 6629- 6639. doi: 10.1109/TSG.2019.2909266
13	MNIH V, KAVUKCUOGLU K, SILVER D, et al Human-level control through deep reinforcement learning. Nature, 2015, 518 (7540): 529- 533. doi: 10.1038/nature14236
14	SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms. Proc. of the 31st International Conference on International Conference on Machine Learning, 2014, 32: 387–395.
15	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971v2.
16	SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization. Proc. of the 31st International Conference on Machine Learning, 2015. DOI: 10.48550/arXiv.1502.05477.
17	DONG L, TANG Y F, HE H B, et al An event-triggered approach for load frequency control with supplementary ADP. IEEE Trans. on Power Systems, 2016, 32 (1): 581- 589.
18	DONG L, ZHONG X N, SUN C Y, et al Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems. IEEE Trans. on Neural Networks and Learning Systems, 2016, 28 (7): 1594- 1605.
19	WU Z Q, WEI J, ZHANG F, et al MDLB: a metadata dynamic load balancing mechanism based on reinforcement learning. Frontiers of Information Technology & Electronic Engineering, 2020, 21 (7): 1034- 1046.
20	XU X, JIA Y W, XU Y, et al A multi-agent reinforcement learning-based data-driven method for home energy management. IEEE Trans. on Smart Grid, 2020, 11 (4): 3201- 3211. doi: 10.1109/TSG.2020.2971427
21	BAHRAMI S, CHEN Y C, WONG V W Deep reinforcement learning for demand response in distribution networks. IEEE Trans. on Smart Grid, 2020, 12 (2): 1496- 1506.
22	WAN Z Q, LI H P, HE H B, et al Model-free real-time EV charging scheduling based on deep reinforcement learning. IEEE Trans. on Smart Grid, 2018, 10 (5): 5246- 5257.
23	CAO J, HARROLD D, FAN Z, et al Deep reinforcement learning-based energy storage arbitrage with accurate lithiumion battery degradation model. IEEE Trans. on Smart Grid, 2020, 11 (5): 4513- 4521. doi: 10.1109/TSG.2020.2986333
24	YU L, XIE W W, XIE D, et al Deep reinforcement learning for smart home energy management. IEEE Internet of Things Journal, 2019, 7 (4): 2751- 2762.
25	MOCANU E, MOCANU D C, NGUYEN P H, et al On-line building energy optimization using deep reinforcement learning. IEEE Trans. on Smart Grid, 2018, 10 (4): 3698- 3708.
26	GOROSTIZA F S, GONZALEZ-LONGATT F M Deep reinforcement learning-based controller for SOC management of multi-electrical energy storage system. IEEE Trans. on Smart Grid, 2020, 11 (6): 5039- 5050. doi: 10.1109/TSG.2020.2996274
27	ZHU F Q, YANG Z P, LIN F, et al Decentralized cooperative control of multiple energy storage systems in urban railway based on multiagent deep reinforcement learning. IEEE Trans. on Power Electronics, 2020, 35 (9): 9368- 9379. doi: 10.1109/TPEL.2020.2971637
28	HUANG T, LIU D R A self-learning scheme for residential energy system control and management. Neural Computing and Applications, 2013, 22 (2): 259- 269. doi: 10.1007/s00521-011-0711-6
29	MBUWIR B V, RUELENS F, SPIESSENS F, et al Battery energy management in a microgrid using batch reinforcement learning. Energies, 2017, 10 (11): 1846. doi: 10.3390/en10111846
30	KIM S, LIM H Reinforcement learning based energy management algorithm for smart energy buildings. Energies, 2018, 11 (8): 2010. doi: 10.3390/en11082010
31	LIU L T, GAURAV S A solution to time-varying Markov decision processes. IEEE Robotics and Automation Letters, 2018, 3 (3): 1631- 1638. doi: 10.1109/LRA.2018.2801479
32	VAZQUEZ-CANTELI J R, NAGY Z Reinforcement learning for demand response: a review of algorithms and modeling techniques. Applied Energy, 2019, 235, 1072- 1089. doi: 10.1016/j.apenergy.2018.11.002
33	WEI Q L, LIU D R, SHI G A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Trans. on Industrial Electronics, 2014, 62 (4): 2509- 2518.
34	ZHU Y H, ZHAO D B, LI X J, et al Control-limited adaptive dynamic programming for multi-battery energy storage systems. IEEE Trans. on Smart Grid, 2018, 10 (4): 4235- 4244.
35	KONG W C, ZHAO Y D, JIA Y W, et al Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. on Smart Grid, 2017, 10 (1): 841- 851.
36	JONSSON T, PINSON P, NIELSEN H A, et al Forecasting electricity spot prices accounting for wind power predictions. IEEE Trans. on Sustainable Energy, 2012, 4 (1): 210- 218.
37	BELLMAN R Dynamic programming. Science, 1996, 153 (3731): 34- 37.
38	LIN L J. Reinforcement learning for robots using neural networks. Pittsburgh: Carnegie Mellon University, 1992.

Parameter	Battery 1	Battery 2	Battery 3	Battery 4
$ {\eta }_{0} $	0.958	0.898	0.858	0.798
$ \xi $	0.073	0.073	0.073	0.073
${E}_{ {\rm{min} } }/{\rm{kWh}}$	1.8	1.6	1.0	0.3
${E}_{{\rm{max}}}/{\rm{kWh}}$	11	9	7	5
${a}_{{\rm{min}}}/{\rm{kW}}$	−0.9	−0.8	−0.7	−0.6
${a}_{{\rm{min}}}/{\rm{kW}}$	0.9	0.8	0.7	0.6

Parameter	Value
Discount factor $ \gamma $	0.85
Learning rate $ \alpha $	0.001
Soft update rate $ \tau $	0.01
Replay buffer size $ M $	100 000
Minibatch size $ K $	32
MaxStep	168
Weighted coefficients $ {m}_{1},{m}_{2} $	0.2, −0.4
Penalty item $ {r}_{0} $	−200

Evaluation metric	Original	PDPG	DPG
Total cost/cents	2286.51	2085.53	2169.63
Saving/%	−	8.79	5.12
Success rate of convergence/%	−	100	70

[1]	Peng LIU, Boyuan XIA, Zhiwei YANG, Jichao LI, Yuejin TAN. A deep reinforcement learning method for multi-stage equipment development planning in uncertain environments [J]. Journal of Systems Engineering and Electronics, 2022, 33(6): 1159-1175.
[2]	Runze GAO, Yuanqing XIA, Li DAI, Zhongqi SUN, Yufeng ZHAN. Design and implementation of data-driven predictive cloud control system [J]. Journal of Systems Engineering and Electronics, 2022, 33(6): 1258-1268.
[3]	Bohao LI, Yunjie WU, Guofei LI. Hierarchical reinforcement learning guidance with threat avoidance [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1173-1185.
[4]	Xiaofeng LI, Lu DONG, Changyin SUN. Hybrid Q-learning for data-based optimal control of non-linear switching system [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1186-1194.
[5]	Ang GAO, Qisheng GUO, Zhiming DONG, Zaijiang TANG, Ziwei ZHANG, Qiqi FENG. Research on virtual entity decision model for LVC tactical confrontation of army units [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1249-1267.
[6]	Jingyu CAO, Lu DONG, Changyin SUN. Day-ahead scheduling based on reinforcement learning with hybrid action space [J]. Journal of Systems Engineering and Electronics, 2022, 33(3): 693-705.
[7]	Xiangyang LIN, Qinghua XING, Fuxian LIU. Choice of discount rate in reinforcement learning with long-delay rewards [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 381-392.
[8]	Wenzhang LIU, Lu DONG, Jian LIU, Changyin SUN. Knowledge transfer in multi-agent reinforcement learning with incremental number of agents [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 447-460.
[9]	Wanping SONG, Zengqiang CHEN, Mingwei SUN, Qinglin SUN. Reinforcement learning based parameter optimization of active disturbance rejection control for autonomous underwater vehicle [J]. Journal of Systems Engineering and Electronics, 2022, 33(1): 170-179.
[10]	Jiandong ZHANG, Qiming YANG, Guoqing SHI, Yi LU, Yong WU. UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1421-1438.
[11]	Kaifang WAN, Bo LI, Xiaoguang GAO, Zijian HU, Zhipeng YANG. A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1490-1508.
[12]	Xin ZENG, Yanwei ZHU, Leping YANG, Chengming ZHANG. A guidance method for coplanar orbital interception based on reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(4): 927-938.
[13]	Ye MA, Tianqing CHANG, Wenhui FAN. A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(3): 642-657.
[14]	Huixiang ZHEN, Wenyin GONG, Ling WANG. Data-driven evolutionary sampling optimization for expensive problems [J]. Journal of Systems Engineering and Electronics, 2021, 32(2): 318-330.
[15]	Rongling Lang, Zheping Xu, and Fei Gao. Data-driven fault diagnosis method for analog circuits based on robust competitive agglomeration [J]. Journal of Systems Engineering and Electronics, 2013, 24(4): 706-712.

Reinforcement learning-based scheduling of multi-battery energy storage system

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

Share this article

Figures/Tables 12

References 38

Related Articles 15

Recommended Articles

Metrics

Comments