首页|Secret Collusion among AI Agents: Multi-Agent Deception via Steganography

Secret Collusion among AI Agents: Multi-Agent Deception via Steganography

来源：

英文摘要

Recent capability increases in large language models (LLMs) open up applications in which groups of communicating generative AI agents solve joint tasks. This poses privacy and security challenges concerning the unauthorised sharing of information, or other unwanted forms of agent coordination. Modern steganographic techniques could render such dynamics hard to detect. In this paper, we comprehensively formalise the problem of secret collusion in systems of generative AI agents by drawing on relevant concepts from both AI and security literature. We study incentives for the use of steganography, and propose a variety of mitigation measures. Our investigations result in a model evaluation framework that systematically tests capabilities required for various forms of secret collusion. We provide extensive empirical results across a range of contemporary LLMs. While the steganographic capabilities of current models remain limited, GPT-4 displays a capability jump suggesting the need for continuous monitoring of steganographic frontier model capabilities. We conclude by laying out a comprehensive research program to mitigate future risks of collusion between generative AI models.

作者：Sumeet Ramesh Motwani、Mikhail Baranchuk、Martin Strohmeier、Vijay Bolina、Philip H. S. Torr、Lewis Hammond、Christian Schroeder de Witt

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Sumeet Ramesh Motwani,Mikhail Baranchuk,Martin Strohmeier,Vijay Bolina,Philip H. S. Torr,Lewis Hammond,Christian Schroeder de Witt.Secret Collusion among AI Agents: Multi-Agent Deception via Steganography[EB/OL].(2025-07-25)[2025-08-04].https://arxiv.org/abs/2402.07510.点此复制

Secret Collusion among AI Agents: Multi-Agent Deception via Steganography

Secret Collusion among AI Agents: Multi-Agent Deception via Steganography

评论