首页|Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

来源：

英文摘要

Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations, where they generate descriptions of nonexistent objects, actions, or concepts, posing significant risks in safety-critical applications. Existing hallucination mitigation methods typically follow one of two paradigms: generation adjustment, which modifies decoding behavior to align text with visual inputs, and post-hoc verification, where external models assess and correct outputs. While effective, generation adjustment methods often rely on heuristics and lack correction mechanisms, while post-hoc verification is complicated, typically requiring multiple models and tending to reject outputs rather than refine them. In this work, we introduce REVERSE, a unified framework that integrates hallucination-aware training with on-the-fly self-verification. By leveraging a new hallucination-verification dataset containing over 1.3M semi-synthetic samples, along with a novel inference-time retrospective resampling technique, our approach enables VLMs to both detect hallucinations during generation and dynamically revise those hallucinations. Our evaluations show that REVERSE achieves state-of-the-art hallucination reduction, outperforming the best existing methods by up to 12% on CHAIR-MSCOCO and 28% on HaloQuest. Our dataset, model, and code are available at: https://reverse-vlm.github.io.

作者：Heekyung Lee、Trevor Darrell、Jiaxin Ge、Joseph E. Gonzalez、David M. Chan、Tsung-Han Wu

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Heekyung Lee,Trevor Darrell,Jiaxin Ge,Joseph E. Gonzalez,David M. Chan,Tsung-Han Wu.Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling[EB/OL].(2025-04-17)[2025-05-11].https://arxiv.org/abs/2504.13169.点此复制

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling

评论