Elevating Styled Mahjong Agents with Learning from Demonstration
Elevating Styled Mahjong Agents with Learning from Demonstration
A wide variety of bots in games enriches the gameplay experience and enhances replayability. Recent advancements in game artificial intelligence have predominantly focused on improving the proficiency of bots. Nevertheless, developing highly competent bots with a wide range of distinct play styles remains a relatively under-explored area. We select the Mahjong game environment as a case study. The high degree of randomness inherent in the Mahjong game and the prevalence of out-of-distribution states lead to suboptimal performance of existing offline learning and Learning-from-Demonstration (LfD) algorithms. In this paper, we leverage the gameplay histories of existing Mahjong agents and put forward a novel LfD algorithm that necessitates only minimal modifications to the Proximal Policy Optimization algorithm. The comprehensive empirical results illustrate that our proposed method not only significantly enhances the proficiency of the agents but also effectively preserves their unique play styles.
Lingfeng Li、Yunlong Lu、Yongyi Wang、Wenxin Li
计算技术、计算机技术
Lingfeng Li,Yunlong Lu,Yongyi Wang,Wenxin Li.Elevating Styled Mahjong Agents with Learning from Demonstration[EB/OL].(2025-06-20)[2025-07-16].https://arxiv.org/abs/2506.16995.点此复制
评论