Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning
Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning
Recent advancements in reinforcement learning (RL) have leveraged neural networks to achieve state-of-the-art performance across various control tasks. However, these successes often come at the cost of significant computational resources, as training deep neural networks requires substantial time and data. In this paper, we introduce an actor-critic algorithm that utilizes randomized neural networks to drastically reduce computational costs while maintaining strong performance. Despite its simple architecture, our method effectively solves a range of control problems, including the locomotion control of a highly dynamic 12-motor quadruped robot, and achieves results comparable to leading algorithms such as Proximal Policy Optimization (PPO). Notably, our approach does not outperform other algorithms in terms of sample efficnency but rather in terms of wall-clock training time. That is, although our algorithm requires more timesteps to converge to an optimal policy, the actual time required for training turns out to be lower.
Zhuochen Liu、Rahul Jain、Quan Nguyen
计算技术、计算机技术
Zhuochen Liu,Rahul Jain,Quan Nguyen.Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning[EB/OL].(2025-05-25)[2025-06-15].https://arxiv.org/abs/2505.19054.点此复制
评论