首页|Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning

Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning

来源：

英文摘要

Value function decomposition methods for cooperative multi-agent reinforcement learning compose joint values from individual per-agent utilities, and train them using a joint objective. To ensure that the action selection process between individual utilities and joint values remains consistent, it is imperative for the composition to satisfy the individual-global max (IGM) property. Although satisfying IGM itself is straightforward, most existing methods (e.g., VDN, QMIX) have limited representation capabilities and are unable to represent the full class of IGM values, and the one exception that has no such limitation (QPLEX) is unnecessarily complex. In this work, we present a simple formulation of the full class of IGM values that naturally leads to the derivation of QFIX, a novel family of value function decomposition models that expand the representation capabilities of prior models by means of a thin "fixing" layer. We derive multiple variants of QFIX, and implement three variants in two well-known multi-agent frameworks. We perform an empirical evaluation on multiple SMACv2 and Overcooked environments, which confirms that QFIX (i) succeeds in enhancing the performance of prior methods, (ii) learns more stably and performs better than its main competitor QPLEX, and (iii) achieves this while employing the simplest and smallest mixing models.

作者：Rupali Bhati、Shuo Liu、Aathira Pillai、Christopher Amato、Andrea Baisero

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Rupali Bhati,Shuo Liu,Aathira Pillai,Christopher Amato,Andrea Baisero.Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning[EB/OL].(2025-05-15)[2025-06-17].https://arxiv.org/abs/2505.10484.点此复制

Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning

Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning

评论