|国家预印本平台
首页|Incentives for Responsiveness, Instrumental Control and Impact

Incentives for Responsiveness, Instrumental Control and Impact

Incentives for Responsiveness, Instrumental Control and Impact

来源:Arxiv_logoArxiv
英文摘要

We introduce three concepts that describe an agent's incentives: response incentives indicate which variables in the environment, such as sensitive demographic information, affect the decision under the optimal policy. Instrumental control incentives indicate whether an agent's policy is chosen to manipulate part of its environment, such as the preferences or instructions of a user. Impact incentives indicate which variables an agent will affect, intentionally or otherwise. For each concept, we establish sound and complete graphical criteria, and discuss general classes of techniques that may be used to produce incentives for safe and fair agent behaviour. Finally, we outline how these notions may be generalised to multi-decision settings. This journal-length paper extends our conference publications "Incentives for Responsiveness, Instrumental Control and Impact" and "Agent Incentives: A Causal Perspective": the material on response incentives and instrumental control incentives is updated, while the work on impact incentives and multi-decision settings is entirely new.

Eric Langlois、Ryan Carey、Chris van Merwijk、Shane Legg、Tom Everitt

自动化基础理论

Eric Langlois,Ryan Carey,Chris van Merwijk,Shane Legg,Tom Everitt.Incentives for Responsiveness, Instrumental Control and Impact[EB/OL].(2025-06-23)[2025-08-02].https://arxiv.org/abs/2001.07118.点此复制

评论