Minimax Rate-Optimal Algorithms for High-Dimensional Stochastic Linear Bandits
Minimax Rate-Optimal Algorithms for High-Dimensional Stochastic Linear Bandits
We study the stochastic linear bandit problem with multiple arms over $T$ rounds, where the covariate dimension $d$ may exceed $T$, but each arm-specific parameter vector is $s$-sparse. We begin by analyzing the sequential estimation problem in the single-arm setting, focusing on cumulative mean-squared error. We show that Lasso estimators are provably suboptimal in the sequential setting, exhibiting suboptimal dependence on $d$ and $T$, whereas thresholded Lasso estimators -- obtained by applying least squares to the support selected by thresholding an initial Lasso estimator -- achieve the minimax rate. Building on these insights, we consider the full linear contextual bandit problem and propose a three-stage arm selection algorithm that uses thresholded Lasso as the main estimation method. We derive an upper bound on the cumulative regret of order $s(\log s)(\log d + \log T)$, and establish a matching lower bound up to a $\log s$ factor, thereby characterizing the minimax regret rate up to a logarithmic term in $s$. Moreover, when a short initial period is excluded from the regret, the proposed algorithm achieves exact minimax optimality.
Jingyu Liu、Yanglei Song
计算技术、计算机技术
Jingyu Liu,Yanglei Song.Minimax Rate-Optimal Algorithms for High-Dimensional Stochastic Linear Bandits[EB/OL].(2025-05-22)[2025-06-08].https://arxiv.org/abs/2505.17400.点此复制
评论