|国家预印本平台
首页|A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning

A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning

A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning

来源:Arxiv_logoArxiv
英文摘要

In this paper, we propose a maximum mutual information (MMI) framework for multi-agent reinforcement learning (MARL) to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the mutual information between actions. By introducing a latent variable to induce nonzero mutual information between actions and applying a variational bound, we derive a tractable lower bound on the considered MMI-regularized objective function. Applying policy iteration to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic (VM3-AC), which follows centralized learning with decentralized execution (CTDE). We evaluated VM3-AC for several games requiring coordination, and numerical results show that VM3-AC outperforms MADDPG and other MARL algorithms in multi-agent tasks requiring coordination.

Youngchul Sung、Whiyoung Jung、Myungsik Cho、Woojun Kim

计算技术、计算机技术

Youngchul Sung,Whiyoung Jung,Myungsik Cho,Woojun Kim.A Maximum Mutual Information Framework for Multi-Agent Reinforcement Learning[EB/OL].(2020-06-04)[2025-08-02].https://arxiv.org/abs/2006.02732.点此复制

评论