Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents
Motivated by applications such as online labor markets we consider a variant of the stochastic multi-armed bandit problem where we have a collection of arms representing strategic agents with different performance characteristics. The platform (principal) chooses an agent in each round to complete a task. Unlike the standard setting, when an arm is pulled it can modify its reward by absorbing it or improving it at the expense of a higher cost. The principle has to solve a mechanism design problem to incentivize the arms to give their best performance. However, since even with an effective mechanism agents may still deviate from rational behavior, the principal wants a robust algorithm that also gives a non-vacuous guarantee on the total accumulated rewards under non-equilibrium behavior. In this paper, we introduce a class of bandit algorithms that meet the two objectives of performance incentivization and robustness simultaneously. We do this by identifying a collection of intuitive properties that a bandit algorithm has to satisfy to achieve these objectives. Finally, we show that settings where the principal has no information about the arms' performance characteristics can be handled by combining ideas from second price auctions with our algorithms.
Suho Shin、Seyed A. Esmaeili、Aleksandrs Slivkins
计算技术、计算机技术
Suho Shin,Seyed A. Esmaeili,Aleksandrs Slivkins.Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents[EB/OL].(2023-12-13)[2025-05-05].https://arxiv.org/abs/2312.07929.点此复制
评论