首页|Solving Sokoban using Hierarchical Reinforcement Learning with Landmarks

Solving Sokoban using Hierarchical Reinforcement Learning with Landmarks

来源：

英文摘要

We introduce a novel hierarchical reinforcement learning (HRL) framework that performs top-down recursive planning via learned subgoals, successfully applied to the complex combinatorial puzzle game Sokoban. Our approach constructs a six-level policy hierarchy, where each higher-level policy generates subgoals for the level below. All subgoals and policies are learned end-to-end from scratch, without any domain knowledge. Our results show that the agent can generate long action sequences from a single high-level call. While prior work has explored 2-3 level hierarchies and subgoal-based planning heuristics, we demonstrate that deep recursive goal decomposition can emerge purely from learning, and that such hierarchies can scale effectively to hard puzzle domains.

作者：Sergey Pastukhov

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Sergey Pastukhov.Solving Sokoban using Hierarchical Reinforcement Learning with Landmarks[EB/OL].(2025-04-06)[2025-04-27].https://arxiv.org/abs/2504.04366.点此复制

Solving Sokoban using Hierarchical Reinforcement Learning with Landmarks

Solving Sokoban using Hierarchical Reinforcement Learning with Landmarks

评论