首页|Neural Logistic Bandits

Neural Logistic Bandits

来源：

英文摘要

We study the problem of neural logistic bandits, where the main task is to learn an unknown reward function within a logistic link function using a neural network. Existing approaches either exhibit unfavorable dependencies on $\kappa$, where $1/\kappa$ represents the minimum variance of reward distributions, or suffer from direct dependence on the feature dimension $d$, which can be huge in neural network-based settings. In this work, we introduce a novel Bernstein-type inequality for self-normalized vector-valued martingales that is designed to bypass a direct dependence on the ambient dimension. This lets us deduce a regret upper bound that grows with the effective dimension $\widetilde{d}$, not the feature dimension, while keeping a minimal dependence on $\kappa$. Based on the concentration inequality, we propose two algorithms, NeuralLog-UCB-1 and NeuralLog-UCB-2, that guarantee regret upper bounds of order $\widetilde{O}(\widetilde{d}\sqrt{\kappa T})$ and $\widetilde{O}(\widetilde{d}\sqrt{T/\kappa})$, respectively, improving on the existing results. Lastly, we report numerical results on both synthetic and real datasets to validate our theoretical findings.

作者：Dabeen Lee、Seoungbin Bae

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Dabeen Lee,Seoungbin Bae.Neural Logistic Bandits[EB/OL].(2025-05-04)[2025-06-07].https://arxiv.org/abs/2505.02069.点此复制

Neural Logistic Bandits

Neural Logistic Bandits

评论