|国家预印本平台
首页|Filtering Learning Histories Enhances In-Context Reinforcement Learning

Filtering Learning Histories Enhances In-Context Reinforcement Learning

Filtering Learning Histories Enhances In-Context Reinforcement Learning

来源:Arxiv_logoArxiv
英文摘要

Transformer models (TMs) have exhibited remarkable in-context reinforcement learning (ICRL) capabilities, allowing them to generalize to and improve in previously unseen environments without re-training or fine-tuning. This is typically accomplished by imitating the complete learning histories of a source RL algorithm over a substantial amount of pretraining environments, which, however, may transfer suboptimal behaviors inherited from the source algorithm/dataset. Therefore, in this work, we address the issue of inheriting suboptimality from the perspective of dataset preprocessing. Motivated by the success of the weighted empirical risk minimization, we propose a simple yet effective approach, learning history filtering (LHF), to enhance ICRL by reweighting and filtering the learning histories based on their improvement and stability characteristics. To the best of our knowledge, LHF is the first approach to avoid source suboptimality by dataset preprocessing, and can be combined with the current state-of-the-art (SOTA) ICRL algorithms. We substantiate the effectiveness of LHF through a series of experiments conducted on the well-known ICRL benchmarks, encompassing both discrete environments and continuous robotic manipulation tasks, with three SOTA ICRL algorithms (AD, DPT, DICP) as the backbones. LHF exhibits robust performance across a variety of suboptimal scenarios, as well as under varying hyperparameters and sampling strategies. Notably, the superior performance of LHF becomes more pronounced in the presence of noisy data, indicating the significance of filtering learning histories.

Weiqin Chen、Xinjie Zhang、Dharmashankar Subramanian、Santiago Paternain

计算技术、计算机技术

Weiqin Chen,Xinjie Zhang,Dharmashankar Subramanian,Santiago Paternain.Filtering Learning Histories Enhances In-Context Reinforcement Learning[EB/OL].(2025-05-21)[2025-06-13].https://arxiv.org/abs/2505.15143.点此复制

评论