|国家预印本平台
首页|AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

来源:Arxiv_logoArxiv
英文摘要

Scaffolding Large Language Models (LLMs) into multi-agent systems often improves performance on complex tasks, but the safety impact of such scaffolds has not been thoroughly explored. We introduce AgentBreeder, a framework for multi-objective self-improving evolutionary search over scaffolds. We evaluate discovered scaffolds on widely recognized reasoning, mathematics, and safety benchmarks and compare them with popular baselines. In 'blue' mode, we see a 79.4% average uplift in safety benchmark performance while maintaining or improving capability scores. In 'red' mode, we find adversarially weak scaffolds emerging concurrently with capability optimization. Our work demonstrates the risks of multi-agent scaffolding and provides a framework for mitigating them. Code is available at https://github.com/J-Rosser-UK/AgentBreeder.

J Rosser、Jakob Nicolaus Foerster

安全科学计算技术、计算机技术

J Rosser,Jakob Nicolaus Foerster.AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement[EB/OL].(2025-06-25)[2025-07-25].https://arxiv.org/abs/2502.00757.点此复制

评论