GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning
GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning
We propose GAM-Agent, a game-theoretic multi-agent framework for enhancing vision-language reasoning. Unlike prior single-agent or monolithic models, GAM-Agent formulates the reasoning process as a non-zero-sum game between base agents--each specializing in visual perception subtasks--and a critical agent that verifies logic consistency and factual correctness. Agents communicate via structured claims, evidence, and uncertainty estimates. The framework introduces an uncertainty-aware controller to dynamically adjust agent collaboration, triggering multi-round debates when disagreement or ambiguity is detected. This process yields more robust and interpretable predictions. Experiments on four challenging benchmarks--MMMU, MMBench, MVBench, and V*Bench--demonstrate that GAM-Agent significantly improves performance across various VLM backbones. Notably, GAM-Agent boosts the accuracy of small-to-mid scale models (e.g., Qwen2.5-VL-7B, InternVL3-14B) by 5--6\%, and still enhances strong models like GPT-4o by up to 2--3\%. Our approach is modular, scalable, and generalizable, offering a path toward reliable and explainable multi-agent multimodal reasoning.
Jusheng Zhang、Yijia Fan、Wenjun Lin、Ruiqi Chen、Haoyi Jiang、Wenhao Chai、Jian Wang、Keze Wang
计算技术、计算机技术
Jusheng Zhang,Yijia Fan,Wenjun Lin,Ruiqi Chen,Haoyi Jiang,Wenhao Chai,Jian Wang,Keze Wang.GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning[EB/OL].(2025-05-29)[2025-06-07].https://arxiv.org/abs/2505.23399.点此复制
评论