首页|Sample Complexity of Variance-reduced Distributionally Robust Q-learning

Sample Complexity of Variance-reduced Distributionally Robust Q-learning

来源：

英文摘要

Dynamic decision-making under distributional shifts is of fundamental interest in theory and applications of reinforcement learning: The distribution of the environment in which the data is collected can differ from that of the environment in which the model is deployed. This paper presents two novel model-free algorithms, namely the distributionally robust Q-learning and its variance-reduced counterpart, that can effectively learn a robust policy despite distributional shifts. These algorithms are designed to efficiently approximate the $q$-function of an infinite-horizon $\gamma$-discounted robust Markov decision process with Kullback-Leibler ambiguity set to an entry-wise $\epsilon$-degree of precision. Further, the variance-reduced distributionally robust Q-learning combines the synchronous Q-learning with variance-reduction techniques to enhance its performance. Consequently, we establish that it attains a minimax sample complexity upper bound of $\tilde O(|\mathbf{S}||\mathbf{A}|(1-\gamma)^{-4}\epsilon^{-2})$, where $\mathbf{S}$ and $\mathbf{A}$ denote the state and action spaces. This is the first complexity result that is independent of the ambiguity size $\delta$, thereby providing new complexity theoretic insights. Additionally, a series of numerical experiments confirm the theoretical findings and the efficiency of the algorithms in handling distributional shifts.

作者：Shengbo Wang、Jose Blanchet、Nian Si、Zhengyuan Zhou

作者单位：

学科分类：计算技术、计算机技术自动化基础理论自动化技术、自动化技术设备

推荐引用：Shengbo Wang,Jose Blanchet,Nian Si,Zhengyuan Zhou.Sample Complexity of Variance-reduced Distributionally Robust Q-learning[EB/OL].(2023-05-28)[2025-08-02].https://arxiv.org/abs/2305.18420.点此复制

Sample Complexity of Variance-reduced Distributionally Robust Q-learning

Sample Complexity of Variance-reduced Distributionally Robust Q-learning

评论