|国家预印本平台
首页|Towards Optimal Differentially Private Regret Bounds in Linear MDPs

Towards Optimal Differentially Private Regret Bounds in Linear MDPs

Towards Optimal Differentially Private Regret Bounds in Linear MDPs

来源:Arxiv_logoArxiv
英文摘要

We study regret minimization under privacy constraints in episodic inhomogeneous linear Markov Decision Processes (MDPs), motivated by the growing use of reinforcement learning (RL) in personalized decision-making systems that rely on sensitive user data. In this setting, both transition probabilities and reward functions are assumed to be linear in a feature mapping $\phi(s, a)$, and we aim to ensure privacy through joint differential privacy (JDP), a relaxation of differential privacy suited to online learning. Prior work has established suboptimal regret bounds by privatizing the LSVI-UCB algorithm, which achieves $\widetilde{O}(\sqrt{d^3 H^4 K})$ regret in the non-private setting. Building on recent advances that improve this to near minimax optimal regret $\widetilde{O}(d\sqrt{H^{3}K})$ via LSVI-UCB++ with Bernstein-style bonuses, we design a new differentially private algorithm by privatizing LSVI-UCB++ and adapting techniques for variance-aware analysis from offline RL. Our algorithm achieves a regret bound of $\widetilde{O}(d \sqrt{H^3 K} + H^{15/4} d^{7/6} K^{1/2} / \epsilon)$, improving over previous private methods. Empirical results show that our algorithm retains near-optimal utility compared to non-private baselines, indicating that privacy can be achieved with minimal performance degradation in this setting.

Sharan Sahu

计算技术、计算机技术

Sharan Sahu.Towards Optimal Differentially Private Regret Bounds in Linear MDPs[EB/OL].(2025-04-12)[2025-05-24].https://arxiv.org/abs/2504.09339.点此复制

评论