A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning

doi:10.23919/JSEE.2021.000055

Abstract

Abstract:

In the evolutionary game of the same task for groups, the changes in game rules, personal interests, the crowd size, and external supervision cause uncertain effects on individual decision-making and game results. In the Markov decision framework, a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game. The model can improve the result of a evolutionary game and facilitate the completion of the task. First, based on the multi-agent theory, to solve the existing problems in the original model, a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group. In addition, in order to evaluate the evolutionary game results of the group in the model, a calculation method of the group intelligence level is defined. Secondly, the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism. In the model, the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals. Finally, simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes, so as to improve the group intelligence level.

Key words: multi-agent, reinforcement learning, evolutionary game, Q-learning

Ye MA, Tianqing CHANG, Wenhui FAN. A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning[J]. Journal of Systems Engineering and Electronics, 2021, 32(3): 642-657.

Figures/Tables 19

Fig 1

Fig 2

Fig 3

Fig 4

Table 1

Fig 5

Fig 6

Fig 7

Fig 8

Fig 9

Fig 10

Fig 11

Table 2

Fig 12

Fig 13

Fig 14

References 34

1	SUTTON R, BARTO A. Reinforcement learning: an introduction. Cambridge: MIT Press, 1998.
2	AWHEDA M D, SCHWARTZ H M. The residual gradient FACL algorithm for differential games. Proc. of the Canadian Conference on Electrical and Computer Engineering, 2015: 1006−1011.
3	JELAI Z. Reinforcement learning based human-prosthetic robot interaction control in movement therapy. Proc. of the International Conference on New Technologies, Development and Application, 2020: 172−181.
4	LITTMAN M L. Markov games as a framework for multi-agent reinforcement learning. Proc. of the 11th International Conference on Machine Learning, 1994: 157−163.
5	LI Y, HAN W, WANG Y Q Deep reinforcement learning with application to air confrontation intelligent decision-making of manned/unmanned aerial vehicle cooperative system. IEEE Access, 2020, 8, 67887- 67898. doi: 10.1109/ACCESS.2020.2985576
6	DEPTULA P, BELL Z I, DOUCETTE E A, et al Data-based reinforcement learning approximate optimal control for an uncertain nonlinear system with control effectiveness faults. Automatica, 2020, 116, 108922. doi: 10.1016/j.automatica.2020.108922
7	GOTTSCHALK S, BURGER M Differences and similarities between reinforcement learning and the classical optimal control framework. Proceedings in Applied Mathematics and Mechanics, 2019, 19 (1): e201900390.
8	LIAO H C, LIU J S. A model-based reinforcement learning approach to time-optimal control problems. Proc. of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, 2019: 657−665.
9	SHI H B, ZHAI L J, WU H B, et al A multi-tier reinforcement learning model for a cooperative multi-agent system. IEEE Trans. on Cognitive and Developmental Systems, 2020, 12 (3): 636- 644. doi: 10.1109/TCDS.2020.2970487
10	NGUYEN N D, NGUYEN T, NAHAVANDI S Multi-agent behavioral control system using deep reinforcement learning. Neurocomputing, 2019, 359 (24): 58- 68.
11	QIE H, SHI D, SHEN T, et al Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning. IEEE Access, 2019, 7, 146264- 146272. doi: 10.1109/ACCESS.2019.2943253
12	FIRDAUSIYAH N, TANIGUCHI E, QURESHI A G Modeling city logistics using adaptive dynamic programming based multi-agent simulation. Transportation Research Part E: Logs and Transportation Review, 2019, 125, 74- 96. doi: 10.1016/j.tre.2019.02.011
13	REN Y, FAN D M, FENG Q, et al Agent-based restoration approach for reliability with load balancing on smart grids. Applied Energy, 2019, 249, 46- 57. doi: 10.1016/j.apenergy.2019.04.119
14	MYERSON R B. Game theory: analysis of conflict. Cambridge: Harvard University Press, 1997.
15	NIE L, WANG X G, PAN F Y A game-theory approach based on genetic algorithm for flexible job shop scheduling problem. Journal of Physics: Conference Series, 2019, 1187, 032095. doi: 10.1088/1742-6596/1187/3/032095
16	WANG X H, ZHONG X X, LI L, et al. PSOGT: PSO and game theoretic based task allocation in mobile edge computing. Proc. of the IEEE 21st International Conference on High Performance Computing and Communications, 2019. DOI: 10.1109/HPCC/SmartCity/DSS. 2019.00318.
17	XU L, HU B, GUAN Z Z, et al. Multi-agent deep reinforcement learning for pursuit-evasion game scalability. Proc. of the Chinese Intelligent Systems Conference, 2020: 658−669.
18	ABDOOS M. A cooperative multi-agent system for traffic signal control using game theory and reinforcement learning. IEEE Intelligent Transportation Systems Magazine, 2020. DOI: 10.1109/MITS. 2020.2990189.
19	BENDOR J, MOOKHERJEE D, RAY D Reinforcement learning in repeated interaction games. Advances in Theoretical Economics, 2001, 3 (2): 159- 174. doi: 10.2202/1534-5963.1008
20	CRANDALL J W, GOODRICH M A Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning. Machine Learning, 2011, 82, 281- 314. doi: 10.1007/s10994-010-5192-9
21	HU J L, WELLMAN M P. Multiagent reinforcement learning: theoretical framework and an algorithm. Proc. of the 15th International Conference on Machine Learning, 1998: 242−250.
22	LIU H, LI J F, GE S Y, et al Coordinated scheduling of grid-connected integrated energy microgrid based on multi-agent game and reinforcement learning. Automation of Electric Power Systems, 2019, 43 (1): 58- 66.
23	XU L, ZHO Z J Channel and power allocation algorithm based on distributed cooperative Q learning. Computer Engineering, 2019, 45 (6): 166- 170, 180.
24	MATTA M, CARDARILLI G C, NUNZIO L D, et al Q-RTS: a real-time swarm intelligence based on multi-agent Q-learning. Electronics Letters, 2019, 55 (10): 589- 591. doi: 10.1049/el.2019.0244
25	CHEN Y, LIU J M, ZHAO H. Social structure emergence: a multi-agent reinforcement learning framework for relationship building. Proc. of the 19th International Conference on Autonomous Agents and Multi-Agent Systems, 2020: 1807−1809.
26	GE Y Y, ZHU F, HUANG W, et al Multi-agent cooperation Q-learning algorithm based on constrained Markov game. Computer Science and Information Systems, 2020, 17 (2): 647- 664. doi: 10.2298/CSIS191220009G
27	DAEICHIAN A, HAGHANI A Fuzzy Q-learning based multi-agent system for intelligent traffic control by a game theory approach. Arabian Journal for Science and Engineering, 2018, 43 (6): 3241- 3247. doi: 10.1007/s13369-017-3018-9
28	ULUSOY U, GUZEL M S, BOSTANCI E. A Q-learning-based approach for simple and multi-agent systems. Multi-Agent Systems-Strategies and Applications, 2020. DOI: 10.5772/intechopen. 88484.
29	HOFBAUER J, SIGMUND K. Evolutionary games and population dynamics. Cambridge: Cambridge University Press, 1998.
30	NOWAK M A. Evolutionary dynamics: exploring the equations of life. Cambridge: Harvard University Press, 2006.
31	SMITH J M. Evolution and the theory of games. Cambridge: Cambridge University Press, 1982.
32	KIMURA M. The neutral theory of molecular evolution. Cambridge: Cambridge University Press, 1983.
33	CHEN Z H, YANG Z H, WANG H B, et al Overview of reinforcement learning from knowledge expression and handling. Control and Decision, 2008, 23 (9): 962- 975.
34	GAO Y, CHEN S F, LU X Research on reinforcement learning technology: a review. Acta Automatica Sinica, 2004, 30 (1): 86- 100.

Combination of strategies	Winner	Combination of strategies	Winner
Co Co Co	Co Co Co	Co D L	D
Co Co D	Co D (any one of Co)	D D D	D (any one of D)
Co Co L	Co Co	D L L	D
Co D D	D (any one of D)	D D L	D (any one of D)
Co L L	Co	L L L	none

Algorithm	Proportion of cooperation strategies	Proportion of competition strategies	Proportion of inaction strategies
Algorithm of this article	0.54	0.23	0.23
Nash-Q learning algorithm	0.51	0.25	0.24
Monte Carlo method	0.47	0.27	0.26
Genetic algorithm	0.45	0.26	0.29

[1]	Bohao LI, Yunjie WU, Guofei LI. Hierarchical reinforcement learning guidance with threat avoidance [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1173-1185.
[2]	Xiaofeng LI, Lu DONG, Changyin SUN. Hybrid Q-learning for data-based optimal control of non-linear switching system [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1186-1194.
[3]	Ang GAO, Qisheng GUO, Zhiming DONG, Zaijiang TANG, Ziwei ZHANG, Qiqi FENG. Research on virtual entity decision model for LVC tactical confrontation of army units [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1249-1267.
[4]	Jingyu CAO, Lu DONG, Changyin SUN. Day-ahead scheduling based on reinforcement learning with hybrid action space [J]. Journal of Systems Engineering and Electronics, 2022, 33(3): 693-705.
[5]	Xiangyang LIN, Qinghua XING, Fuxian LIU. Choice of discount rate in reinforcement learning with long-delay rewards [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 381-392.
[6]	Wenzhang LIU, Lu DONG, Jian LIU, Changyin SUN. Knowledge transfer in multi-agent reinforcement learning with incremental number of agents [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 447-460.
[7]	Zheng WANG, Zhiyuan HU, Xuanfang YANG. Multi-agent and ant colony optimization for ship integrated power system network reconfiguration [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 489-496.
[8]	Wanping SONG, Zengqiang CHEN, Mingwei SUN, Qinglin SUN. Reinforcement learning based parameter optimization of active disturbance rejection control for autonomous underwater vehicle [J]. Journal of Systems Engineering and Electronics, 2022, 33(1): 170-179.
[9]	Jiandong ZHANG, Qiming YANG, Guoqing SHI, Yi LU, Yong WU. UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1421-1438.
[10]	Kaifang WAN, Bo LI, Xiaoguang GAO, Zijian HU, Zhipeng YANG. A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1490-1508.
[11]	Sader MALIKA, Fuyong WANG, Zhongxin LIU, Zengqiang CHEN. Distributed fuzzy fault-tolerant consensus of leader-follower multi-agent systems with mismatched uncertainties [J]. Journal of Systems Engineering and Electronics, 2021, 32(5): 1031-1040.
[12]	Xin ZENG, Yanwei ZHU, Leping YANG, Chengming ZHANG. A guidance method for coplanar orbital interception based on reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(4): 927-938.
[13]	Duo QI, Junhua HU, Xiaolong LIANG, Jiaqiang ZHANG, Zhihao ZHANG. Research on consensus of multi-agent systems with and without input saturation constraints [J]. Journal of Systems Engineering and Electronics, 2021, 32(4): 947-955.
[14]	Ming ZHANG, Jianjun ZHU, Hehua WANG. Evolutionary game analysis of problem processing mechanism in new collaboration [J]. Journal of Systems Engineering and Electronics, 2021, 32(1): 136-150.
[15]	Bingqiang LI, Tianyi LAN, Yiyun ZHAO, Shuaishuai LYU. Open-loop and closed-loop $D^{\alpha} $ -type iterative learning control for fractional-order linear multi-agent systems with state-delays [J]. Journal of Systems Engineering and Electronics, 2021, 32(1): 197-208.