|国家预印本平台
首页|High-dimensional Nonparametric Contextual Bandit Problem

High-dimensional Nonparametric Contextual Bandit Problem

High-dimensional Nonparametric Contextual Bandit Problem

来源:Arxiv_logoArxiv
英文摘要

We consider the kernelized contextual bandit problem with a large feature space. This problem involves $K$ arms, and the goal of the forecaster is to maximize the cumulative rewards through learning the relationship between the contexts and the rewards. It serves as a general framework for various decision-making scenarios, such as personalized online advertising and recommendation systems. Kernelized contextual bandits generalize the linear contextual bandit problem and offers a greater modeling flexibility. Existing methods, when applied to Gaussian kernels, yield a trivial bound of $O(T)$ when we consider $\Omega(\log T)$ feature dimensions. To address this, we introduce stochastic assumptions on the context distribution and show that no-regret learning is achievable even when the number of dimensions grows up to the number of samples. Furthermore, we analyze lenient regret, which allows a per-round regret of at most $\Delta > 0$. We derive the rate of lenient regret in terms of $\Delta$.

Shogo Iwazaki、Junpei Komiyama、Masaaki Imaizumi

计算技术、计算机技术

Shogo Iwazaki,Junpei Komiyama,Masaaki Imaizumi.High-dimensional Nonparametric Contextual Bandit Problem[EB/OL].(2025-05-20)[2025-06-14].https://arxiv.org/abs/2505.14102.点此复制

评论