|国家预印本平台
首页|The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems

The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems

The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems

来源:Arxiv_logoArxiv
英文摘要

We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities. We focus on generating human-imperceptible adversarial examples and introduce a novel imperceptible retrieve-to-generate attack against RAG. This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set, in order to influence the final answer generation. To address this task, we propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG and continuously refines attack strategies based on relevance-generation-naturalness rewards. Experiments on newly constructed factual and non-factual question-answering benchmarks demonstrate that ReGENT significantly outperforms existing attack methods in misleading RAG systems with small imperceptible text perturbations.

Hongru Song、Yu-an Liu、Ruqing Zhang、Jiafeng Guo、Jianming Lv、Maarten de Rijke、Xueqi Cheng

计算技术、计算机技术

Hongru Song,Yu-an Liu,Ruqing Zhang,Jiafeng Guo,Jianming Lv,Maarten de Rijke,Xueqi Cheng.The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems[EB/OL].(2025-05-24)[2025-06-08].https://arxiv.org/abs/2505.18583.点此复制

评论