事实驱动的图文大模型幻觉缓解算法
Fact-Based Hallucination Mitigation for Large Vision-Language Models
邱伟杰 1姜竹青1
作者信息
- 1. 北京邮电大学人工智能学院,北京 100876
- 折叠
摘要
大型图文大语言模型通过联合建模视觉与语言表征,在多模态理解任务中展现出强大能力。尽管已取得显著进展,如何确保生成内容忠实于视觉依据仍然是一个根本性挑战。当前研究往往依赖日益复杂的模型微调或引入额外计算成本与外部依赖的解码阶段启发式方法来解决幻觉问题,本文则从基于事实证据的输出引导这一新视角重新思考幻觉缓解机制。具体而言,该方法直接从输入图像中提取细粒度的事实性描述,并将其嵌入结构化提示模板,从而显式约束生成过程。这种事实增强的提示策略以轻量化、无需训练的方式强化了模型的视觉基础,使输出结果更具忠实性。在多个幻觉基准测试上的大量实验表明,该方案能有效减少幻觉生成,持续优于现有先进方法,同时保持显著更高的运行效率。}\abstractCHN{大型图文大语言模型通过联合建模视觉与语言表征,显著提升了多模态理解能力。然而,确保生成内容忠实于视觉依据仍是一项根本性挑战。当前,缓解幻觉的研究主要依赖两类手段:一是日益复杂的模型微调,二是涉及额外计算开销或外部依赖的解码启发式方法。不同于上述路径,本文从基于事实证据的输出引导这一新视角出发,重新审视幻觉缓解机制。具体而言,该方法直接从输入图像中提取细粒度的事实性描述,并将其嵌入结构化提示模板,从而实现对生成过程的显式约束。这种事实增强的提示策略具有轻量化、无需训练的优势。 它在强化模型视觉基础的同时,提升了输出内容的忠实性。在多个幻觉基准测试上的实验结果表明,该方案能有效抑制幻觉生成。与现有先进方法相比,本文方法在保持更优表现的同时,展现出更高的运行效率。
Abstract
Large-scale vision-language models (LVLMs) have significantly enhanced multimodal understanding capabilities by jointly modeling visual and linguistic representations. However, ensuring that generated content remains faithful to visual evidence remains a fundamental challenge. Current research on hallucination mitigation primarily relies on two approaches: increasingly complex model fine-tuning, or decoding-stage heuristics that involve additional computational overhead and external dependencies. Departing from these paths, this paper re-evaluates hallucination mitigation through the novel lens of fact-based output guidance. Specifically, the proposed method extracts fine-grained factual descriptions directly from input images and embeds them into structured prompt templates to explicitly constrain the generation process. This fact-augmented prompting strategy offers the advantages of being lightweight and training-free. While strengthening the model's visual grounding, it enhances the faithfulness of the output. Extensive experiments across multiple benchmarks demonstrate that our approach effectively suppresses hallucinations, consistently outperforming state-of-the-art methods with superior operational efficiency关键词
人工智能/多模态/大语言模型/幻觉/提示词工程Key words
Artificial Intelligence/Multimodal/Large Language Models/Hallucination/Prompt Engineering引用本文复制引用
邱伟杰,姜竹青.事实驱动的图文大模型幻觉缓解算法[EB/OL].(2026-02-12)[2026-02-14].http://www.paper.edu.cn/releasepaper/content/202602-78.学科分类
计算技术、计算机技术
评论