首页|Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Bayesian Design Principles for Offline-to-Online Reinforcement Learning

来源：

英文摘要

Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma. Instead of adopting optimistic or pessimistic policies, the agent should act in a way that matches its belief in optimal policies. Such a probability-matching agent can avoid a sudden performance drop while still being guaranteed to find the optimal policy. Based on our theoretical findings, we introduce a novel algorithm that outperforms existing methods on various benchmarks, demonstrating the efficacy of our approach. Overall, the proposed approach provides a new perspective on offline-to-online RL that has the potential to enable more effective learning from offline data.

作者：Tangjie Lv、Ziqing Mai、Hao Hu、Changjie Fan、Chengjie Wu、Yujing Hu、Qianchuan Zhao、Chongjie Zhang、Yiqin Yang、Jianing Ye

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Tangjie Lv,Ziqing Mai,Hao Hu,Changjie Fan,Chengjie Wu,Yujing Hu,Qianchuan Zhao,Chongjie Zhang,Yiqin Yang,Jianing Ye.Bayesian Design Principles for Offline-to-Online Reinforcement Learning[EB/OL].(2024-05-31)[2025-08-02].https://arxiv.org/abs/2405.20984.点此复制

Bayesian Design Principles for Offline-to-Online Reinforcement Learning

Bayesian Design Principles for Offline-to-Online Reinforcement Learning

评论