首页|Central Path Proximal Policy Optimization

Central Path Proximal Policy Optimization

来源：

英文摘要

In constrained Markov decision processes, enforcing constraints during training is often thought of as decreasing the final return. Recently, it was shown that constraints can be incorporated directly in the policy geometry, yielding an optimization trajectory close to the central path of a barrier method, which does not compromise final return. Building on this idea, we introduce Central Path Proximal Policy Optimization (C3PO), a simple modification of PPO that produces policy iterates, which stay close to the central path of the constrained optimization problem. Compared to existing on-policy methods, C3PO delivers improved performance with tighter constraint enforcement, suggesting that central path-guided updates offer a promising direction for constrained policy optimization.

作者：Nikola Milosevic、Johannes Müller、Nico Scherf

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Nikola Milosevic,Johannes Müller,Nico Scherf.Central Path Proximal Policy Optimization[EB/OL].(2025-05-31)[2025-06-28].https://arxiv.org/abs/2506.00700.点此复制

Central Path Proximal Policy Optimization

Central Path Proximal Policy Optimization

评论