A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning

doi:10.23919/JSEE.2021.000055

Journal of Systems Engineering and Electronics ›› 2021, Vol. 32 ›› Issue (3): 642-657.doi: 10.23919/JSEE.2021.000055

收稿日期:2020-03-03 出版日期:2021-06-18 发布日期:2021-07-26

A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning

Ye MA^1,*(), Tianqing CHANG¹(), Wenhui FAN²()

¹ Academy of Army Armored Force, Beijing 100072, China
² Department of Automation, Tsinghua University, Beijing 100084, China

Received:2020-03-03 Online:2021-06-18 Published:2021-07-26
Contact: Ye MA E-mail:mayegf@126.com;oliver_chan1214@126.com;fanwenhui@tsinghua.edu.cn
About author:|MA Ye was born in 1993. She received her master’s degree from the Academy of Army Armored Force, Beijing, in 2017. Currently, she is a Ph.D. candidate at the Academy of Army Armored Force. Her research interests include intelligent technology of control system, and modeling and simulation of complex systems. E-mail: mayegf@126.com||CHANG Tianqing was born in 1963. He received his Ph.D. degree in concurrent engineering from Tsinghua University in 1999. Since 2000, he has been a professor with the Academy of Army Armored Force. His current research interests include target detection and recognition, as well as navigation, guidance and control. E-mail: oliver_chan1214@126.com||FAN Wenhui was born in 1968. He received his Ph.D. degree in control science and engineering from Zhejiang University in 1998. Currently, he is a professor, doctoral tutor, and vice-director in the Department of Automation, Tsinghua University. His research interests include modeling and simulation of complex systems, product information integration modeling technology, product lifecycle management technology, and collaborative design platform technology. E-mail: fanwenhui@tsinghua.edu.cn
Supported by:
This work was supported by the National Key R&D Program of China (2017YFB1400105)

摘要/Abstract

Abstract:

In the evolutionary game of the same task for groups, the changes in game rules, personal interests, the crowd size, and external supervision cause uncertain effects on individual decision-making and game results. In the Markov decision framework, a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game. The model can improve the result of a evolutionary game and facilitate the completion of the task. First, based on the multi-agent theory, to solve the existing problems in the original model, a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group. In addition, in order to evaluate the evolutionary game results of the group in the model, a calculation method of the group intelligence level is defined. Secondly, the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism. In the model, the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals. Finally, simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes, so as to improve the group intelligence level.

Key words: multi-agent, reinforcement learning, evolutionary game, Q-learning

. [J]. Journal of Systems Engineering and Electronics, 2021, 32(3): 642-657.

Ye MA, Tianqing CHANG, Wenhui FAN. A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning[J]. Journal of Systems Engineering and Electronics, 2021, 32(3): 642-657.

图/表 19

参考文献 34

1	SUTTON R, BARTO A. Reinforcement learning: an introduction. Cambridge: MIT Press, 1998.
2	AWHEDA M D, SCHWARTZ H M. The residual gradient FACL algorithm for differential games. Proc. of the Canadian Conference on Electrical and Computer Engineering, 2015: 1006−1011.
3	JELAI Z. Reinforcement learning based human-prosthetic robot interaction control in movement therapy. Proc. of the International Conference on New Technologies, Development and Application, 2020: 172−181.
4	LITTMAN M L. Markov games as a framework for multi-agent reinforcement learning. Proc. of the 11th International Conference on Machine Learning, 1994: 157−163.
5	LI Y, HAN W, WANG Y Q Deep reinforcement learning with application to air confrontation intelligent decision-making of manned/unmanned aerial vehicle cooperative system. IEEE Access, 2020, 8, 67887- 67898. doi: 10.1109/ACCESS.2020.2985576
6	DEPTULA P, BELL Z I, DOUCETTE E A, et al Data-based reinforcement learning approximate optimal control for an uncertain nonlinear system with control effectiveness faults. Automatica, 2020, 116, 108922. doi: 10.1016/j.automatica.2020.108922
7	GOTTSCHALK S, BURGER M Differences and similarities between reinforcement learning and the classical optimal control framework. Proceedings in Applied Mathematics and Mechanics, 2019, 19 (1): e201900390.
8	LIAO H C, LIU J S. A model-based reinforcement learning approach to time-optimal control problems. Proc. of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, 2019: 657−665.
9	SHI H B, ZHAI L J, WU H B, et al A multi-tier reinforcement learning model for a cooperative multi-agent system. IEEE Trans. on Cognitive and Developmental Systems, 2020, 12 (3): 636- 644. doi: 10.1109/TCDS.2020.2970487
10	NGUYEN N D, NGUYEN T, NAHAVANDI S Multi-agent behavioral control system using deep reinforcement learning. Neurocomputing, 2019, 359 (24): 58- 68.
11	QIE H, SHI D, SHEN T, et al Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning. IEEE Access, 2019, 7, 146264- 146272. doi: 10.1109/ACCESS.2019.2943253
12	FIRDAUSIYAH N, TANIGUCHI E, QURESHI A G Modeling city logistics using adaptive dynamic programming based multi-agent simulation. Transportation Research Part E: Logs and Transportation Review, 2019, 125, 74- 96. doi: 10.1016/j.tre.2019.02.011
13	REN Y, FAN D M, FENG Q, et al Agent-based restoration approach for reliability with load balancing on smart grids. Applied Energy, 2019, 249, 46- 57. doi: 10.1016/j.apenergy.2019.04.119
14	MYERSON R B. Game theory: analysis of conflict. Cambridge: Harvard University Press, 1997.
15	NIE L, WANG X G, PAN F Y A game-theory approach based on genetic algorithm for flexible job shop scheduling problem. Journal of Physics: Conference Series, 2019, 1187, 032095. doi: 10.1088/1742-6596/1187/3/032095
16	WANG X H, ZHONG X X, LI L, et al. PSOGT: PSO and game theoretic based task allocation in mobile edge computing. Proc. of the IEEE 21st International Conference on High Performance Computing and Communications, 2019. DOI: 10.1109/HPCC/SmartCity/DSS. 2019.00318.
17	XU L, HU B, GUAN Z Z, et al. Multi-agent deep reinforcement learning for pursuit-evasion game scalability. Proc. of the Chinese Intelligent Systems Conference, 2020: 658−669.
18	ABDOOS M. A cooperative multi-agent system for traffic signal control using game theory and reinforcement learning. IEEE Intelligent Transportation Systems Magazine, 2020. DOI: 10.1109/MITS. 2020.2990189.
19	BENDOR J, MOOKHERJEE D, RAY D Reinforcement learning in repeated interaction games. Advances in Theoretical Economics, 2001, 3 (2): 159- 174. doi: 10.2202/1534-5963.1008
20	CRANDALL J W, GOODRICH M A Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning. Machine Learning, 2011, 82, 281- 314. doi: 10.1007/s10994-010-5192-9
21	HU J L, WELLMAN M P. Multiagent reinforcement learning: theoretical framework and an algorithm. Proc. of the 15th International Conference on Machine Learning, 1998: 242−250.
22	LIU H, LI J F, GE S Y, et al Coordinated scheduling of grid-connected integrated energy microgrid based on multi-agent game and reinforcement learning. Automation of Electric Power Systems, 2019, 43 (1): 58- 66.
23	XU L, ZHO Z J Channel and power allocation algorithm based on distributed cooperative Q learning. Computer Engineering, 2019, 45 (6): 166- 170, 180.
24	MATTA M, CARDARILLI G C, NUNZIO L D, et al Q-RTS: a real-time swarm intelligence based on multi-agent Q-learning. Electronics Letters, 2019, 55 (10): 589- 591. doi: 10.1049/el.2019.0244
25	CHEN Y, LIU J M, ZHAO H. Social structure emergence: a multi-agent reinforcement learning framework for relationship building. Proc. of the 19th International Conference on Autonomous Agents and Multi-Agent Systems, 2020: 1807−1809.
26	GE Y Y, ZHU F, HUANG W, et al Multi-agent cooperation Q-learning algorithm based on constrained Markov game. Computer Science and Information Systems, 2020, 17 (2): 647- 664. doi: 10.2298/CSIS191220009G
27	DAEICHIAN A, HAGHANI A Fuzzy Q-learning based multi-agent system for intelligent traffic control by a game theory approach. Arabian Journal for Science and Engineering, 2018, 43 (6): 3241- 3247. doi: 10.1007/s13369-017-3018-9
28	ULUSOY U, GUZEL M S, BOSTANCI E. A Q-learning-based approach for simple and multi-agent systems. Multi-Agent Systems-Strategies and Applications, 2020. DOI: 10.5772/intechopen. 88484.
29	HOFBAUER J, SIGMUND K. Evolutionary games and population dynamics. Cambridge: Cambridge University Press, 1998.
30	NOWAK M A. Evolutionary dynamics: exploring the equations of life. Cambridge: Harvard University Press, 2006.
31	SMITH J M. Evolution and the theory of games. Cambridge: Cambridge University Press, 1982.
32	KIMURA M. The neutral theory of molecular evolution. Cambridge: Cambridge University Press, 1983.
33	CHEN Z H, YANG Z H, WANG H B, et al Overview of reinforcement learning from knowledge expression and handling. Control and Decision, 2008, 23 (9): 962- 975.
34	GAO Y, CHEN S F, LU X Research on reinforcement learning technology: a review. Acta Automatica Sinica, 2004, 30 (1): 86- 100.

Combination of strategies	Winner	Combination of strategies	Winner
Co Co Co	Co Co Co	Co D L	D
Co Co D	Co D (any one of Co)	D D D	D (any one of D)
Co Co L	Co Co	D L L	D
Co D D	D (any one of D)	D D L	D (any one of D)
Co L L	Co	L L L	none

Algorithm	Proportion of cooperation strategies	Proportion of competition strategies	Proportion of inaction strategies
Algorithm of this article	0.54	0.23	0.23
Nash-Q learning algorithm	0.51	0.25	0.24
Monte Carlo method	0.47	0.27	0.26
Genetic algorithm	0.45	0.26	0.29

A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 19

参考文献 34

相关文章 0

编辑推荐

Metrics

本文评价