|国家预印本平台
首页|Robust Online Learning with Private Information

Robust Online Learning with Private Information

Robust Online Learning with Private Information

来源:Arxiv_logoArxiv
英文摘要

This paper investigates the robustness of online learning algorithms when learners possess private information. No-external-regret algorithms, prevalent in machine learning, are vulnerable to strategic manipulation, allowing an adaptive opponent to extract full surplus. Even standard no-weak-external-regret algorithms, designed for optimal learning in stationary environments, exhibit similar vulnerabilities. This raises a fundamental question: can a learner simultaneously prevent full surplus extraction by adaptive opponents while maintaining optimal performance in well-behaved environments? To address this, we model the problem as a two-player repeated game, where the learner with private information plays against the environment, facing ambiguity about the environment's types: stationary or adaptive. We introduce \emph{partial safety} as a key design criterion for online learning algorithms to prevent full surplus extraction. We then propose the \emph{Explore-Exploit-Punish} (\textsf{EEP}) algorithm and prove that it satisfies partial safety while achieving optimal learning in stationary environments, and has a variant that delivers improved welfare performance. Our findings highlight the risks of applying standard online learning algorithms in strategic settings with adverse selection. We advocate for a shift toward online learning algorithms that explicitly incorporate safeguards against strategic manipulation while ensuring strong learning performance.

Kyohei Okumura

计算技术、计算机技术

Kyohei Okumura.Robust Online Learning with Private Information[EB/OL].(2025-05-08)[2025-05-25].https://arxiv.org/abs/2505.05341.点此复制

评论