首页|Policy gradient methods for ordinal policies

Policy gradient methods for ordinal policies

来源：

英文摘要

In reinforcement learning, the softmax parametrization is the standard approach for policies over discrete action spaces. However, it fails to capture the order relationship between actions. Motivated by a real-world industrial problem, we propose a novel policy parametrization based on ordinal regression models adapted to the reinforcement learning setting. Our approach addresses practical challenges, and numerical experiments demonstrate its effectiveness in real applications and in continuous action tasks, where discretizing the action space and applying the ordinal policy yields competitive performance.

作者：Sim?3n Weinberger、Jairo Cugliari

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Sim?3n Weinberger,Jairo Cugliari.Policy gradient methods for ordinal policies[EB/OL].(2025-06-23)[2025-07-25].https://arxiv.org/abs/2506.18614.点此复制

Policy gradient methods for ordinal policies

Policy gradient methods for ordinal policies

评论