|国家预印本平台
首页|MIRA: A Novel Framework for Fusing Modalities in Medical RAG

MIRA: A Novel Framework for Fusing Modalities in Medical RAG

MIRA: A Novel Framework for Fusing Modalities in Medical RAG

来源:Arxiv_logoArxiv
英文摘要

Multimodal Large Language Models (MLLMs) have significantly advanced AI-assisted medical diagnosis, but they often generate factually inconsistent responses that deviate from established medical knowledge. Retrieval-Augmented Generation (RAG) enhances factual accuracy by integrating external sources, but it presents two key challenges. First, insufficient retrieval can miss critical information, whereas excessive retrieval can introduce irrelevant or misleading content, disrupting model output. Second, even when the model initially provides correct answers, over-reliance on retrieved data can lead to factual errors. To address these issues, we introduce the Multimodal Intelligent Retrieval and Augmentation (MIRA) framework, designed to optimize factual accuracy in MLLM. MIRA consists of two key components: (1) a calibrated Rethinking and Rearrangement module that dynamically adjusts the number of retrieved contexts to manage factual risk, and (2) A medical RAG framework integrating image embeddings and a medical knowledge base with a query-rewrite module for efficient multimodal reasoning. This enables the model to effectively integrate both its inherent knowledge and external references. Our evaluation of publicly available medical VQA and report generation benchmarks demonstrates that MIRA substantially enhances factual accuracy and overall performance, achieving new state-of-the-art results. Code is released at https://github.com/mbzuai-oryx/MIRA.

Jinhong Wang、Tajamul Ashraf、Zongyan Han、Jorma Laaksonen、Rao Mohammad Anwer

计算技术、计算机技术医学现状、医学发展

Jinhong Wang,Tajamul Ashraf,Zongyan Han,Jorma Laaksonen,Rao Mohammad Anwer.MIRA: A Novel Framework for Fusing Modalities in Medical RAG[EB/OL].(2025-07-10)[2025-07-17].https://arxiv.org/abs/2507.07902.点此复制

评论