|国家预印本平台
首页|State-Constrained Offline Reinforcement Learning

State-Constrained Offline Reinforcement Learning

State-Constrained Offline Reinforcement Learning

来源:Arxiv_logoArxiv
英文摘要

Traditional offline reinforcement learning (RL) methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of distributional shift but restricting the policy to seen actions. In this paper, we alleviate this limitation by introducing state-constrained offline RL, a novel framework that focuses solely on the dataset's state distribution. This approach allows the policy to take high-quality out-of-distribution actions that lead to in-distribution states, significantly enhancing learning potential. The proposed setting not only broadens the learning horizon but also improves the ability to combine different trajectories from the dataset effectively, a desirable property inherent in offline RL. Our research is underpinned by theoretical findings that pave the way for subsequent advancements in this area. Additionally, we introduce StaCQ, a deep learning algorithm that achieves state-of-the-art performance on the D4RL benchmark datasets and aligns with our theoretical propositions. StaCQ establishes a strong baseline for forthcoming explorations in this domain.

Charles A. Hepburn、Yue Jin、Giovanni Montana

计算技术、计算机技术

Charles A. Hepburn,Yue Jin,Giovanni Montana.State-Constrained Offline Reinforcement Learning[EB/OL].(2025-07-14)[2025-07-25].https://arxiv.org/abs/2405.14374.点此复制

评论