A finite time analysis of distributed Q-learning
A finite time analysis of distributed Q-learning
Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL). In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards. In particular, we study finite-time analysis of a distributed Q-learning algorithm, and provide a new sample complexity result of $\tilde{\mathcal{O}}\left( \min\left\{\frac{1}{ε^2}\frac{t_{\text{mix}}}{(1-γ)^6 d_{\min}^4 } ,\frac{1}ε\frac{\sqrt{|\gS||\gA|}}{(1-Ï_2(\boldsymbol{W}))(1-γ)^4 d_{\min}^3} \right\}\right)$ under tabular lookup
Han-Dong Lim、Donghwan Lee
计算技术、计算机技术
Han-Dong Lim,Donghwan Lee.A finite time analysis of distributed Q-learning[EB/OL].(2025-07-29)[2025-08-11].https://arxiv.org/abs/2405.14078.点此复制
评论