首页|Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives

Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives

来源：

英文摘要

Recent advances in reinforcement learning (RL) have renewed focus on the design of reward functions that shape agent behavior. Manually designing reward functions is tedious and error-prone. A principled alternative is to specify behaviors in a formal language that can be automatically translated into rewards. Omega-regular languages are a natural choice for this purpose, given their established role in formal verification and synthesis. However, existing methods using omega-regular specifications typically rely on discounted reward RL in episodic settings, with periodic resets. This setup misaligns with the semantics of omega-regular specifications, which describe properties over infinite behavior traces. In such cases, the average reward criterion and the continuing setting -- where the agent interacts with the environment over a single, uninterrupted lifetime -- are more appropriate. To address the challenges of infinite-horizon, continuing tasks, we focus on absolute liveness specifications -- a subclass of omega-regular languages that cannot be violated by any finite behavior prefix, making them well-suited to the continuing setting. We present the first model-free RL framework that translates absolute liveness specifications to average-reward objectives. Our approach enables learning in communicating MDPs without episodic resetting. We also introduce a reward structure for lexicographic multi-objective optimization, aiming to maximize an external average-reward objective among the policies that also maximize the satisfaction probability of a given omega-regular specification. Our method guarantees convergence in unknown communicating MDPs and supports on-the-fly reductions that do not require full knowledge of the environment, thus enabling model-free RL. Empirical results show our average-reward approach in continuing setting outperforms discount-based methods across benchmarks.

作者：Milad Kazemi、Mateo Perez、Fabio Somenzi、Sadegh Soudjani、Ashutosh Trivedi、Alvaro Velasquez

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Milad Kazemi,Mateo Perez,Fabio Somenzi,Sadegh Soudjani,Ashutosh Trivedi,Alvaro Velasquez.Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives[EB/OL].(2025-05-21)[2025-06-03].https://arxiv.org/abs/2505.15693.点此复制

Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives

Average Reward Reinforcement Learning for Omega-Regular and Mean-Payoff Objectives

评论