首页|Partially Observable Contextual Bandits with Linear Payoffs

Partially Observable Contextual Bandits with Linear Payoffs

来源：

英文摘要

The standard contextual bandit framework assumes fully observable and actionable contexts. In this work, we consider a new bandit setting with partially observable, correlated contexts and linear payoffs, motivated by the applications in finance where decision making is based on market information that typically displays temporal correlation and is not fully observed. We make the following contributions marrying ideas from statistical signal processing with bandits: (i) We propose an algorithmic pipeline named EMKF-Bandit, which integrates system identification, filtering, and classic contextual bandit algorithms into an iterative method alternating between latent parameter estimation and decision making. (ii) We analyze EMKF-Bandit when we select Thompson sampling as the bandit algorithm and show that it incurs a sub-linear regret under conditions on filtering. (iii) We conduct numerical simulations that demonstrate the benefits and practical applicability of the proposed pipeline.

作者：Alec Koppel、Sihan Zeng、Sumitra Ganesh、Sujay Bhatt

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Alec Koppel,Sihan Zeng,Sumitra Ganesh,Sujay Bhatt.Partially Observable Contextual Bandits with Linear Payoffs[EB/OL].(2024-09-17)[2025-08-02].https://arxiv.org/abs/2409.11521.点此复制

Partially Observable Contextual Bandits with Linear Payoffs

Partially Observable Contextual Bandits with Linear Payoffs

评论