Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess
Xiali LI , Xiaoyu FAN , Junzhi YU , Zhicheng DONG , Xianmu CAIRANG , Ping LAN
Front. Inform. Technol. Electron. Eng ›› 2025, Vol. 26 ›› Issue (10) : 1969 -1983.
Jiu fusion artificial intelligence (JFA): a two-stage reinforcement learning model with hierarchical neural networks and human knowledge for Tibetan Jiu chess
Tibetan Jiu chess, recognized as a national intangible cultural heritage, is a complex game comprising two distinct phases: the layout phase and the battle phase. Improving the performance of deep reinforcement learning (DRL) models for Tibetan Jiu chess is challenging, especially given the constraints of hardware resources. To address this, we propose a two-stage model called JFA, which incorporates hierarchical neural networks and knowledge-guided techniques. The model includes sub-models: strategic layout model (SLM) for the layout phase and hierarchical battle model (HBM) for the battle phase. Both sub-models use similar network structures and employ parallel Monte Carlo tree search (MCTS) methods for independent self-play training. HBM is structured as a hierarchical neural network, with the upper network selecting movement and jump capturing actions and the lower network handling square capturing actions. Human knowledge-based auxiliary agents are introduced to assist SLM and HBM, simulating the entire game and providing reward signals based on square capturing or victory outcomes. Additionally, within the HBM, we propose two human knowledge-based pruning methods that prune parallel MCTS and capture actions in the lower network. In the experiments against a layout model using the AlphaZero method, SLM achieves a 74% win rate, with the decision-making time being reduced to approximately 1/147 of the time required by the AlphaZero model. SLM also won the first place at the 2024 China National Computer Game Tournament. HBM achieves a 70% win rate when playing against other Tibetan Jiu chess models. When used together, SLM and HBM in JFA achieve an 81% win rate, comparable to the level of a human amateur 4-dan player. These results demonstrate that JFA effectively enhances artificial intelligence (AI) performance in Tibetan Jiu chess.
Games / Reinforcement learning / Tibetan Jiu chess / Separate two-stage model / Self-play / Hierarchical neural network / Parallel Monte Carlo tree search
Zhejiang University Press
/
| 〈 |
|
〉 |