首页|Non-Stationary Lipschitz Bandits

Non-Stationary Lipschitz Bandits

来源：

英文摘要

We study the problem of non-stationary Lipschitz bandits, where the number of actions is infinite and the reward function, satisfying a Lipschitz assumption, can change arbitrarily over time. We design an algorithm that adaptively tracks the recently introduced notion of significant shifts, defined by large deviations of the cumulative reward function. To detect such reward changes, our algorithm leverages a hierarchical discretization of the action space. Without requiring any prior knowledge of the non-stationarity, our algorithm achieves a minimax-optimal dynamic regret bound of $\mathcal{\widetilde{O}}(\tilde{L}^{1/3}T^{2/3})$, where $\tilde{L}$ is the number of significant shifts and $T$ the horizon. This result provides the first optimal guarantee in this setting.

作者：Nicolas Nguyen、Solenne Gaucher、Claire Vernade

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Nicolas Nguyen,Solenne Gaucher,Claire Vernade.Non-Stationary Lipschitz Bandits[EB/OL].(2025-05-24)[2025-06-18].https://arxiv.org/abs/2505.18871.点此复制

Non-Stationary Lipschitz Bandits

Non-Stationary Lipschitz Bandits

评论