Journal of Systems Engineering and Electronics ›› 2025, Vol. 36 ›› Issue (5): 1353-1373.doi: 10.23919/JSEE.2025.000115

• CONTROL THEORY AND APPLICATION • Previous Articles    

Self-play training and analysis for GEO inspection game with modular actions

Rui ZHOU1,2(), Weichao ZHONG3(), Wenlong LI3(), Hao ZHANG1,2,*()   

  1. 1 Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China
    2 University of Chinese Academy of Sciences, Beijing 100049, China
    3 Shanghai Institute of Satellite Engineering, Shanghai 201109, China
  • Received:2024-09-18 Online:2025-10-18 Published:2025-10-24
  • Contact: Hao ZHANG E-mail:zhourui221@mails.ucas.ac.cn;zhongweichaohit@gmail.com;liwenlongzacao@126.com;hao.zhang.zhr@gmail.com
  • About author:
    ZHOU Rui was born in 2000. He received his B.S. degree at the Harbin Engineering University, Harbin, China, in 2022, and M.S. degree in flight vehicle design at the University of Chinese Academy of Sciences. His research interests include the dynamics and control of spacecraft, application of artificial intelligence to the astronautic systems. E-mail: zhourui221@mails.ucas.ac.cn

    ZHONG Weichao was born in 1985. He received his B.S., M.S., and Ph.D. degrees in aerospace engineering from the Harbin Institute of Technology, Harbin, China, in 2007, 2009, and 2014, respectively. He is currently an engineer with Shanghai Institute of Satellite Engineering. His research interests are testbed of satellite, simulation of dynamics, and optical remote sensing technology. E-mail: zhongweichaohit@gmail.com

    LI Wenlong was born in 1988. He received his Ph.D. degree in spacecraft design from Beijing University of Aeronautics and Astronautics in 2015, specializing in the orbit mechanics. Currently, he works at the Shanghai Institute of Satellite Engineering. His research interest is satellite system design. E-mail: liwenlongzacao@126.com

    ZHANG Hao was born in 1986. He received his B.S. and Ph.D. degrees in aerospace engineering from Beihang University, China, in 2006 and 2012, respectively. From 2012 to 2017, he was with the Asher Space Research Institute, Technion–Israel Institute of Technology, where he was a postdoctoral fellow. He is currently a research fellow with the Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, China. His research interests include distributed space systems and astrodynamics. E-mail: hao.zhang.zhr@gmail.com

Abstract:

This paper comprehensively explores the impulsive on-orbit inspection game problem utilizing reinforcement learning and game training methods. The purpose of the spacecraft is to inspect the entire surface of a non-cooperative target with active maneuverability in front lighting. First, the impulsive orbital game problem is formulated as a turn-based sequential game problem. Second, several typical relative orbit transfers are encapsulated into modules to construct a parameterized action space containing discrete modules and continuous parameters, and multi-pass deep Q-networks (MPDQN) algorithm is used to implement autonomous decision-making. Then, a curriculum learning method is used to gradually increase the difficulty of the training scenario. The backtracking proportional self-play training framework is used to enhance the agent’s ability to defeat inconsistent strategies by building a pool of opponents. The behavior variations of the agents during training indicate that the intelligent game system gradually evolves towards an equilibrium situation. The restraint relations between the agents show that the agents steadily improve the strategy. The influence of various factors on game results is tested.

Key words: impulsive orbital game, inspection mission, turn-based, reinforcement learning, modular action, self-play