首页|Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret

Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret

来源：

英文摘要

We address differentially private stochastic bandit problems from the angles of exploring the deep connections among Thompson Sampling with Gaussian priors, Gaussian mechanisms, and Gaussian differential privacy (GDP). We propose DP-TS-UCB, a novel parametrized private bandit algorithm that enables to trade off privacy and regret. DP-TS-UCB satisfies $ \tilde{O} \left(T^{0.25(1-\alpha)}\right)$-GDP and enjoys an $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bound, where $\alpha \in [0,1]$ controls the trade-off between privacy and regret. Theoretically, our DP-TS-UCB relies on anti-concentration bounds of Gaussian distributions and links exploration mechanisms in Thompson Sampling-based algorithms and Upper Confidence Bound-based algorithms, which may be of independent interest.

作者：Bingshan Hu、Tianyue H. Zhang、Zhiming Huang、Mathias Lécuyer、Nidhi Hegde

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Bingshan Hu,Tianyue H. Zhang,Zhiming Huang,Mathias Lécuyer,Nidhi Hegde.Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret[EB/OL].(2025-05-05)[2025-06-03].https://arxiv.org/abs/2505.02383.点此复制

Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret

Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret

评论