首页|Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors

Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors

来源：

英文摘要

In this paper, we consider the problem of recovering time-varying reward functions from either optimal policies or demonstrations coming from a max entropy reinforcement learning problem. This problem is highly ill-posed without additional assumptions on the underlying rewards. However, in many applications, the rewards are indeed parsimonious, and some prior information is available. We consider two such priors on the rewards: 1) rewards are mostly constant and they change infrequently, 2) rewards can be represented by a linear combination of a small number of feature functions. We first show that the reward identification problem with the former prior can be recast as a sparsification problem subject to linear constraints. Moreover, we give a polynomial-time algorithm that solves this sparsification problem exactly. Then, we show that identifying rewards representable with the minimum number of features can be recast as a rank minimization problem subject to linear constraints, for which convex relaxations of rank can be invoked. In both cases, these observations lead to efficient optimization-based reward identification algorithms. Several examples are given to demonstrate the accuracy of the recovered rewards as well as their generalizability.

作者：Mohamad Louai Shehab、Alperen Tercan、Necmiye Ozay

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Mohamad Louai Shehab,Alperen Tercan,Necmiye Ozay.Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors[EB/OL].(2025-08-10)[2025-08-24].https://arxiv.org/abs/2508.07400.点此复制

Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors

Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors

评论