Journal of Systems Engineering and Electronics ›› 2023, Vol. 34 ›› Issue (1): 117-128.doi: 10.23919/JSEE.2023.000036

• SYSTEMS ENGINEERING • Previous Articles     Next Articles

Reinforcement learning-based scheduling of multi-battery energy storage system

Guangran CHENG1,2(), Lu DONG3(), Xin YUAN1(), Changyin SUN1,2,*()   

  1. 1 School of Automation, Southeast University, Nanjing 210096, China
    2 Peng Cheng Laboratory, Shenzhen 518066, China
    3 School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
  • Received:2021-12-29 Online:2023-02-18 Published:2023-03-03
  • Contact: Changyin SUN E-mail:chenggr@seu.edu.cn;ldong90@seu.edu.cn;xinyuan@seu.edu.cn;cysun@seu.edu.cn
  • About author:
    CHENG Guangran was born in 1996. She received her B.S. degree in automation from Nanjing University of Science and Technology, Nanjing, China, in 2018. She is currently working toward her Ph.D. degree in control science and engineering at the School of Automation, Southeast University, Nanjing, China. Her current research interests include reinforcement learning, multi-objective learning, and robot navigation. E-mail: chenggr@seu.edu.cn

    DONG Lu was born in 1990. She received her B.S. degree in the School of Physics and Ph.D. degree in the School of Automation from Southeast University, Nanjing, China, in 2012 and 2017, respectively. She is currently an associate professor with the School of Cyber Science and Engineering, Southeast University. Her current research interests include adaptive dynamic programming, event-triggered control, nonlinear system control, and optimization. E-mail: ldong90@seu.edu.cn

    YUAN Xin was born in 1989. He received his B.S. degree in electrical engineering and M.S. degree in vehicle operation engineering from Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2012 and 2015, respectively, and Ph.D. degree in control science and engineering from Southeast University, Nanjing, China, in 2021. Currently, he is working as a postdoctoral researcher with the School of Automation, Southeast University, Nanjing, China. He was a joint Ph.D. student with the Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI, USA, from 2017 to 2018. His current research interests include reinforcement learning, unmanned aerial vehicle, and optimal control. E-mail: xinyuan@seu.edu.cn

    SUN Changyin was born in 1975. He received his B.S. degree in applied mathematics from the College of Mathematics, Sichuan University, Chengdu, China, in 1996, and M.S. and Ph.D. degrees in electrical engineering from Southeast University, Nanjing, China, in 2001 and 2004, respectively. He is currently a professor with the School of Automation, Southeast University, Nanjing, China. His current research interests include intelligent control, flight control, and optimal theory. E-mail: cysun@seu.edu.cn
  • Supported by:
    This work was supported by the National Key R&D Program of China (2018AAA0101400), the National Natural Science Foundation of China (61921004;62173251;U1713209;62236002), the Fundamental Research Funds for the Central Universities, and Guangdong Provincial Key Laboratory of Intelligent Decision and Cooperative Control.

Abstract:

In this paper, a reinforcement learning-based multi-battery energy storage system (MBESS) scheduling policy is proposed to minimize the consumers ’ electricity cost. The MBESS scheduling problem is modeled as a Markov decision process (MDP) with unknown transition probability. However, the optimal value function is time-dependent and difficult to obtain because of the periodicity of the electricity price and residential load. Therefore, a series of time-independent action-value functions are proposed to describe every period of a day. To approximate every action-value function, a corresponding critic network is established, which is cascaded with other critic networks according to the time sequence. Then, the continuous management strategy is obtained from the related action network. Moreover, a two-stage learning protocol including offline and online learning stages is provided for detailed implementation in real-time battery management. Numerical experimental examples are given to demonstrate the effectiveness of the developed algorithm.

Key words: multi-battery energy storage system (MBESS), reinforcement learning, periodic value iteration, data-driven