序贯公共物品困境下的公平高效策略学习
Learning Fair and Efficient Policies in Sequential Public Goods Dilemmas
在社会困境中,理性的个体可以在短期内通过叛逃获得较高的回报,但这会使得集体效用变低甚至导致任务失败。最近许多研究都致力于在社会困境中诱导合作行为,但这些方法只在无状态的矩阵博弈中起作用,而无法应用于序贯社会困境。在涉及大量参与者的复杂状态的序贯社会困境任务中,合作不再是单步的动作选择,因此学习合作策略更加困难。在分散式多智能体强化学习中,为了防止某些智能体占用过多的资源而导致其他智能体无法享有资源,一些研究将平等性引入智能体的奖励信号中。然而,这种追求奖励平等性的方法并不能在社会困境中产生有效的合作策略,因为如果某些智能体的效率极低,那么这种方法将迫使学习良好的智能体牺牲其高效率来换取平等。本文考虑了序贯公共物品困境,其中群体成员需要为公共福利做出贡献以获得较高的集体收益。本文通过在训练中考虑公平性,学习良好的智能体可以在不受其他智能体策略约束的情况下获得足够的奖励,同时,由于存在足够的公共物品,策略学习较为落后的智能体有更多的机会获得有效的经验。实验结果表明该方法在集体效率和公平性方面都有较好的表现。与基线相比,本文方法所训练的智能体在序贯公共物品困境中获得了更通用和可持续的策略。
Rational individuals can obtain higher rewards in the short term by defecting in social dilemmas, which, however, leads to low collective utility or even task failure. Many recent works have induced cooperative behaviors in social dilemmas though, they work only in stateless matrix games but fail in sequential social dilemmas. In tasks of sequential social dilemmas involving large number of players and complex states, cooperation is no longer simply one-step action and is hard to learn. Some works take payoffs equality into agents’ reward signals in decentralized multi-agent reinforcement learning to prevent some agents from taking up too much resources and starving others. However, this payoffs equality cannot lead to effective cooperative strategy, because it will force well-learned agents to sacrifice their high efficiency for equality if some agents have extremely low performance. In this work, we consider sequential public goods dilemmas in which group members can donate voluntarily for public welfare. We take fairness into account for training, well-learned agents obtain adequate rewards without being constrained by the policies of others, and meanwhile, the laggards have more access to learning owing to sufficient public goods. We empirically show that our method has excellent performance both in terms of collective efficiency and fairness. Compared to baselines, our agents acquire more universal and sustainable policies in sequential public goods dilemmas.
陈莘宁、陈奕天、刘璇、张士庚
计算技术、计算机技术自动化技术、自动化技术设备自动化基础理论
人工智能多智能体强化学习社会困境公平与效率
rtificial intelligence Multi-agent reinforcement learning Social dilemmas Fairness and efficiency
陈莘宁,陈奕天,刘璇,张士庚.序贯公共物品困境下的公平高效策略学习[EB/OL].(2023-05-19)[2025-08-21].http://www.paper.edu.cn/releasepaper/content/202305-156.点此复制
评论