Research on virtual entity decision model for LVC tactical confrontation of army units

doi:10.23919/JSEE.2022.000119

Journal of Systems Engineering and Electronics ›› 2022, Vol. 33 ›› Issue (5): 1249-1267.doi: 10.23919/JSEE.2022.000119

• RELIABILITY • Previous Articles Next Articles

Research on virtual entity decision model for LVC tactical confrontation of army units

Ang GAO(), Qisheng GUO(), Zhiming DONG*(), Zaijiang TANG(), Ziwei ZHANG(), Qiqi FENG()

¹ Military Exercise and Training Center, Army Academy of Armored Forces, Beijing 100072, China

Received:2020-10-12 Accepted:2022-06-16 Online:2022-10-27 Published:2022-10-27
Contact: Zhiming DONG E-mail:15689783388@163.com;236211566@qq.com;dong_zhiming@163.com;tangzaijiang@sina.com;gaoang370829@sohu.com;594472717@qq.com
About author:|GAO Ang was born in 1988. He received his Ph.D. degree in science of military equipemnt from Army Academy of Armored Forces. He is a Ph.D. candidate in Army Academy of Armored Forces. His research interest is intelligent decision of computer generated force based on multi-agent deep reinforcement learning. E-mail: 15689783388@163.com||GUO Qisheng was born in 1962. He received his Ph.D. degree in science of military equipemnt from Tsinghua University. His research interests are equipment requirement demonstration and equipment test. E-mail: 236211566@qq.com||DONG Zhiming was born in 1977. He received his Ph.D. degree in science of military equipemnt from Army Academy of Armored Forces. His research interests are equipment requirement demonstration and equipment test. E-mail: dong_zhiming@163.com||TANG Zaijiang was born in 1976. He received his Ph.D. degree in science of military equipemnt from Army Academy of Armored Forces. His research interest is battle simulation. E-mail: tangzaijiang@sina.com||ZHANG Ziwei was born in 1986. He received his Ph.D. degree in science of military equipemnt from Army Academy of Armored Forces. He is a Ph.D. candidate in Army Academy of Armored Forces. His research interest is equipment test evaluation. E-mail: gaoang370829@sohu.com||FENG Qiqi was born in 1992. She received her M.S. degree in science of military equipemnt form Army Academy of Armored Forces. She is pursuing her Ph.D. degree in Army Academy of Armored Forces. Her research interest is real-time research of live virtual constructive. E-mail: 594472717@qq.com
Supported by:
This work was supported by the Military Scentific Research Project (41405030302; 41401020301)

Abstract

Abstract:

According to the requirements of the live-virtual-constructive (LVC) tactical confrontation (TC) on the virtual entity (VE) decision model of graded combat capability, diversified actions, real-time decision-making, and generalization for the enemy, the confrontation process is modeled as a zero-sum stochastic game (ZSG). By introducing the theory of dynamic relative power potential field, the problem of reward sparsity in the model can be solved. By reward shaping, the problem of credit assignment between agents can be solved. Based on the idea of meta-learning, an extensible multi-agent deep reinforcement learning (EMADRL) framework and solving method is proposed to improve the effectiveness and efficiency of model solving. Experiments show that the model meets the requirements well and the algorithm learning efficiency is high.

Key words: live-virtual-constructive (LVC), army unit, tactical confrontation (TC), intelligent decision model, multi-agent deep reinforcement learning

Ang GAO, Qisheng GUO, Zhiming DONG, Zaijiang TANG, Ziwei ZHANG, Qiqi FENG. Research on virtual entity decision model for LVC tactical confrontation of army units[J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1249-1267.

Figures/Tables 39

Fig 1

Fig 2

Table 1

Fig 3

Fig 4

Fig 5

Table 2

Fig 6

Fig 7

Fig 8

Fig 9

Fig 10

Fig 11

Fig 12

Fig 13

Fig 14

Fig 15

Fig 16

Fig 17

Fig 18

Fig 19

Fig 20

Fig 21

Fig 22

Fig 23

Fig 24

Fig 25

Fig 26

Fig 27

Table 3

Fig 28

Fig 29

Table 4

Table 5

Fig 30

Fig 31

Table 6

References 21

1	BEST C, FLTLT B R Science and technology enablers of live virtual constructive training in the air domain. Air & Space Power Journal, 2018, 32 (4): 59- 73.
2	CHEN B, WANG J, WANG Y Intelligent virtual training partner in embedded training system of fighter. Acta Aeronautica et Astronautica Sinica, 2020, 41 (6): 523467. doi: 10.7527/S1000-6839.2019.23467
3	OU Q, HE X Y, TAO J Y Overview of cooperative target assignment. Journal of System Simulation, 2019, 31 (11): 2216- 2227.
4	HERNANDEZ-LEAL P, KARTAL B, TAYLOR M E A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-agent Systems, 2019, 33 (6): 750- 797. doi: 10.1007/s10458-019-09421-1
5	MCGREW J S, HOW J P, WILLIAMS B, et al Air-combat strategy using approximate dynamic programming. Journal of Guidance, Control, and Dynamics, 2010, 33 (5): 1641- 1654. doi: 10.2514/1.46815
6	ZHANG J, WANG G, YUE S H, et al Multi-agent system application in accordance with game theory in bi-directional coordination network model. Journal of Systems Engineering and Electronics, 2020, 31 (2): 279- 289. doi: 10.23919/JSEE.2020.000006
7	CHEN Y F, SUN X P, LIU D J, et al. Optimal guidance method for UCAV in close free air combat[C]//Proc. of the IEEE International Conference on Ubiquitous Computing & Communications and Data Science and Computational Intelligence and Smart Computing, Networking and Services, 2019: 356−360.
8	LIU B, ZHANG X P, WANG R, et al Air combat decision making for coordinated multiple target attack using combinatorial auction. Acta Aeronautica et Astronautica Sinica, 2010, 31 (7): 1433- 1444.
9	XU A, YU L, KOU Y X, et al Stealthy engagement maneuvering strategy for air combat based on MDP. Journal of Systems Engineering and Electronics, 2011, 33 (5): 1063- 1068.
10	XU X M, YANG R N, FU Y Situation assessment for air combat based on novel semi-supervised naive Bayes. Journal of Systems Engineering and Electronics, 2018, 29 (4): 768- 779. doi: 10.21629/JSEE.2018.04.11
11	RUAN C W, ZHOU Z L, LIU H Q Task assignment under constraint of timing sequential for cooperative air combat. Journal of Systems Engineering and Electronics, 2016, 27 (4): 836- 844. doi: 10.21629/JSEE.2016.04.12
12	ROESSINGH J J, TOUBMAN A, VAN OIJEN J, et al Machine learning techniques for autonomous agents in military simulations. Proc. of the IEEE International Conference on Systems, Man, and Cybernetics, 2017, 3445- 3450.
13	STEIN G, GONZALEZ A J, BARHAM C Combining NEAT and PSO for learning tactical human behavior. Springer London, 2015, 26 (4): 747- 764.
14	CHEN X L, ZHANG Y L Research on tactical decision-making of army units based on deep reinforcement learning. Military Operations and Systems Engineering, 2017, 31 (3): 20- 27, 57.
15	XU X, YANG M, LI G, et al. HTN guided game tree search for adaptive CGF commander behavior modeling[C]//Proc. of the IEEE International Conference on Agents, 2017: 78−83.
16	KAMRANI F, LUOTSINEN L J, LØVLID R A. Learning objective agent behavior using a data-driven modeling approach[C]//Proc. of the IEEE International Conference on Systems, Man, and Cybernetics , 2016: 2175−2181.
17	LUOTSINEN L J, KAMRANI F, HAMMAR P, et al. Evolved creative intelligence for computer generated forces[C]//Proc. of the IEEE International Conference on Systems, Man, and Cybernetics, 2016: 3063−3070.
18	TOGHIANI-RIZI B. Evaluation of deep learning methods for creating synthetic actors. Sweden: Uppsala University Press, 2017.
19	XU X, YANG M, LI G Adaptive CGF commander behavior modeling through HTN guided monte carlo tree search. Journal of Systems Science and Systems Engineering, 2018, 27 (2): 231- 249. doi: 10.1007/s11518-018-5366-8
20	XUAN H J, XIANG Y, HE X Q, et al Multi-objective optimization model and algorithm for weapon-target assignment problem in joint fire strike. Journal of Xinyang Normal University (Natural Science Edition), 2019, 32 (4): 664- 669.
21	KONG D P, CHANG T Q, HAO N, et al Confrontation-based cooperative fire strike decision-making method of assault weapons and support weapons. Acta Armamentarii, 2019, 40 (3): 184- 195.

Method category	Representative method	Graded combat capability	Diversified actions diversification	Real-time decisions
Methods based on knowledge, reasoning, and planning	Rule-based reasoning, contextual-based reasoning, case-based reasoning, finite state machine	D	D	A
Methods based on problem solving	Mathematical solution ^[5,6]	D	D	C
Methods based on problem solving	Machine search ^[7-9]	D	D	B
Methods based on uncertain knowledge reasoning	Bayesian network ^[10], fuzzy theory, utility theory ^[11]	D	B	A
Methods based on autonomous learning ^[12]	Deep learning, DRL	C	A	A

Element	Meaning
l_j	The grid in which j is currently located
h_j	j’s intact state value of j
t_j	j’s weapon cooling time
p_j	j’s relative power potential field (RPPF)
e_j	A list of enemy units within the detection range of j
eh_j	The intact state value of each enemy unit within the detection range of j
et_j	The weapon cooling time of each enemy unit within the detection range of j

Subtask	Episode/winning rate
Plan A	2501?2600	2601?2700	2701?2800	2801?2900	2901?3000
Plan A	0.95	0.93	0.98	0.94	0.93
Plan B	6501?6600	6601?6700	6701?6800	6801?6900	6901?7000
Plan B	0.97	0.90	1	1	1
Plan C	1501?1600	1601?1700	1701?1800	1801?1900	1901?2000
Plan C	0.93	0.93	0.98	0.99	0.97
Plans A, B, C appear randomly	601?700	701?800	801?900	901?1000	1001?1100
Plans A, B, C appear randomly	0.98	0.99	0.98	0.98	0.93

Episode	Timeline
1	2646	2746	2846	2946	3046	2947	2848	2847	2846	2746	2745	2744
2	2646	2746	2846	2847	2947	3047	3147	3048	2948	2848
3	2446	2346	2246	2247	2348	2246	2146	2046
4	2446	2445	2345	2245	2145	2045	1945	1845	1745	1744
5	2446	2346	2246	2346	2446	2346	2246	2146
6	2446	2346	2246	2146	2046	1946	1846	1746	1646
7	2446	2346	2347	2348	2248	2148	2048	1948	1848	1748	1747	1746	1646	1546
8	2446	2346	2246	2346	2246	2146	2046	1946	1846	1845	1844
9	2446	2346	2246	2146	2046	1946	1846	1746	1745	1744
10	2547	2447	2547	2447	2347	2247	2147	2047	1947	1946	1945

Episode	1	2	3	4	5	6	7	8	9	10
1	?	0.700	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
2	?	?	1.000	1.000	1.000	1.000	1.000	1.000	1.000	1.000
3	?		?	0.875	0.500	0.500	0.625	0.500	0.375	0.875
4	?	?	?	?	0.875	0.889	0.900	0.800	0.800	0.900
5	?	?	?	?	?	0.625	0.750	0.500	0.500	1.000
6	?	?	?	?	?	?	0.556	0.222	0.222	0.889
7	?	?	?	?	?	?	?	0.818	0.700	1.000
8	?	?	?	?	?	?	?	?	0.400	0.909
9	?	?	?	?	?	?	?	?	?	0.900
10	?	?	?	?	?	?	?	?	?	?
Average difference degree			0.802

Research on virtual entity decision model for LVC tactical confrontation of army units

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

Share this article

Figures/Tables 39

References 21

Related Articles 0

Recommended Articles

Metrics

Comments

Real combat	Combat simulation
Incomplete information game Imperfect information games	Complete information game Imperfect information games
Reward function is complicated	Reward function is relatively simple
Sample data is scarce	Sample data is huge