|国家预印本平台
首页|Reliability-Adjusted Prioritized Experience Replay

Reliability-Adjusted Prioritized Experience Replay

Reliability-Adjusted Prioritized Experience Replay

来源:Arxiv_logoArxiv
英文摘要

Experience replay enables data-efficient learning from past experiences in online reinforcement learning agents. Traditionally, experiences were sampled uniformly from a replay buffer, regardless of differences in experience-specific learning potential. In an effort to sample more efficiently, researchers introduced Prioritized Experience Replay (PER). In this paper, we propose an extension to PER by introducing a novel measure of temporal difference error reliability. We theoretically show that the resulting transition selection algorithm, Reliability-adjusted Prioritized Experience Replay (ReaPER), enables more efficient learning than PER. We further present empirical results showing that ReaPER outperforms PER across various environment types, including the Atari-10 benchmark.

Leonard S. Pleiss、Tobias Sutter、Maximilian Schiffer

计算技术、计算机技术

Leonard S. Pleiss,Tobias Sutter,Maximilian Schiffer.Reliability-Adjusted Prioritized Experience Replay[EB/OL].(2025-07-03)[2025-07-16].https://arxiv.org/abs/2506.18482.点此复制

评论