|国家预印本平台
首页|Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits

Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits

Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits

来源:Arxiv_logoArxiv
英文摘要

Dueling bandit is a variant of the Multi-armed bandit to learn the binary relation by comparisons. Most work on the dueling bandit has targeted transitive relations, that is, totally/partially ordered sets, or assumed at least the existence of a champion such as Condorcet winner and Copeland winner. This work develops an analysis of dueling bandits for non-transitive relations. Jan-ken (a.k.a. rock-paper-scissors) is a typical example of a non-transitive relation. It is known that a rational player chooses one of three items uniformly at random, which is known to be Nash equilibrium in game theory. Interestingly, any variant of Jan-ken with four items (e.g., rock, paper, scissors, and well) contains at least one useless item, which is never selected by a rational player. This work investigates a dueling bandit problem to identify whether all $n$ items are indispensable in a given win-lose relation. Then, we provide upper and lower bounds of the sample complexity of the identification problem in terms of the determinant of $A$ and a solution of $\mathbf{x}^{\top} A = \mathbf{0}^{\top}$ where $A$ is an $n \times n$ pay-off matrix that every duel follows.

Shang Lu、Shuji Kijima

计算技术、计算机技术

Shang Lu,Shuji Kijima.Sample Complexity of Identifying the Nonredundancy of Nontransitive Games in Dueling Bandits[EB/OL].(2025-05-08)[2025-06-10].https://arxiv.org/abs/2505.05014.点此复制

评论