Learning-based primal-dual optimal control of discrete-time stochastic systems with multiplicative noise
Learning-based primal-dual optimal control of discrete-time stochastic systems with multiplicative noise
Reinforcement learning (RL) is an effective approach for solving optimal control problems without knowing the exact information of the system model. However, the classical Q-learning method, a model-free RL algorithm, has its limitations, such as lack of strict theoretical analysis and the need for artificial disturbances during implementation. This paper explores the partially model-free stochastic linear quadratic regulator (SLQR) problem for a system with multiplicative noise from the primal-dual perspective to address these challenges. This approach lays a strong theoretical foundation for understanding the intrinsic mechanisms of classical RL algorithms. We reformulate the SLQR into a non-convex primal-dual optimization problem and derive a strong duality result, which enables us to provide model-based and model-free algorithms for SLQR optimal policy design based on the Karush-Kuhn-Tucker (KKT) conditions. An illustrative example demonstrates the proposed model-free algorithm's validity, showcasing the central nervous system's learning mechanism in human arm movement.
Xiushan Jiang、Weihai Zhang
自动化基础理论计算技术、计算机技术
Xiushan Jiang,Weihai Zhang.Learning-based primal-dual optimal control of discrete-time stochastic systems with multiplicative noise[EB/OL].(2025-06-03)[2025-07-18].https://arxiv.org/abs/2506.02613.点此复制
评论