Journal of Systems Engineering and Electronics ›› 2022, Vol. 33 ›› Issue (6): 1159-1175.

• SYSTEMS ENGINEERING •

### A deep reinforcement learning method for multi-stage equipment development planning in uncertain environments

Peng LIU(), Boyuan XIA(), Zhiwei YANG(), Jichao LI(), Yuejin TAN()

1. 1 College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
• Received:2021-01-11 Online:2022-12-18 Published:2022-12-24
• Contact: Jichao LI E-mail:liupeng81@nudt.edu.cn;xiaboyuan11@nudt.edu.cn;zhwyang88@126.com;ljcnudt@hotmail.com;yjtan@nudt.edu.cn
• About author:
LIU Peng was born in 1981. He received his M.S. degree in higher education management from the National University and Defense Technology (NUDT), Changsha, China, in 2008. He received his Ph.D. degree in management science and engineering from the NUDT, in 2020. His research interests include system of systems engineering, complex systems, and systems evaluation and optimization. E-mail: liupeng81@nudt.edu.cn

XIA Boyuan was born in 1994. He received his B.S. and M.S. degrees in management science and engineering from the National University and Defense Technology (NUDT), Changsha, China, in 2015 and 2017, respectively. He is currently a Ph.D. candidate in the College of Systems Engineering, NUDT. His research interests include system of systems engineering, complex systems, and systems evaluation and optimization E-mail: xiaboyuan11@nudt.edu.cn

YANG Zhiwei was born in 1988. He received his B.E. degree in management science and M.E. degree in management science and engineering from the National University of Defense Technology, Changsha, Hunan, China, in 2010 and 2012, respectively. He received his Ph.D. degree in computer science from Leiden University, Leiden, the Netherlands, in 2016. His research interests include studying intelligence computing and the evaluation of complex systems. E-mail: zhwyang88@126.com

LI Jichao was born in 1990. He received his B.E. degree in management science, M.E. and Ph.D. degrees in management science and engineering from the National University of Defense Technology, Changsha, Hunan, China, in 2013, 2015, and 2019, respectively. His research interests include studying complex systems with a combination of theoretical tool and data analysis, including mathematical modeling of heterogeneous information networks, applying network methodologies to analyze the development of complex system-of-systems, and data-driven studying of the collective behavior of humans. E-mail: ljcnudt@hotmail.com

TAN Yuejin was born in 1958. He received his B.E. degree in mathematics from Hunan Normal University, and M.E. degree in systems engineering from the National University of Defense Technology, Changsha, Hunan, China, in 1981 and 1985 respectively. His research interests include system of systems (SoS) requirements modeling, SoS architecture design and optimization, complex network, and system modeling and simulation.E-mail: yjtan@nudt.edu.cn
• Supported by:
This work was supported by the National Natural Science Foundation of China (71690233;72001209), and the Scientific Research Foundation of the National University of Defense Technology (ZK19-16)

Abstract:

Equipment development planning (EDP) is usually a long-term process often performed in an environment with high uncertainty. The traditional multi-stage dynamic programming cannot cope with this kind of uncertainty with unpredictable situations. To deal with this problem, a multi-stage EDP model based on a deep reinforcement learning (DRL) algorithm is proposed to respond quickly to any environmental changes within a reasonable range. Firstly, the basic problem of multi-stage EDP is described, and a mathematical planning model is constructed. Then, for two kinds of uncertainties (future capability requirements and the amount of investment in each stage), a corresponding DRL framework is designed to define the environment, state, action, and reward function for multi-stage EDP. After that, the dueling deep Q-network (Dueling DQN) algorithm is used to solve the multi-stage EDP to generate an approximately optimal multi-stage equipment development scheme. Finally, a case of ten kinds of equipment in 100 possible environments, which are randomly generated, is used to test the feasibility and effectiveness of the proposed models. The results show that the algorithm can respond instantaneously in any state of the multi-stage EDP environment and unlike traditional algorithms, the algorithm does not need to re-optimize the problem for any change in the environment. In addition, the algorithm can flexibly adjust at subsequent planning stages in the event of a change to the equipment capability requirements to adapt to the new requirements.