首页|Taming Vision-Language Models for Medical Image Analysis: A Comprehensive Review

Taming Vision-Language Models for Medical Image Analysis: A Comprehensive Review

来源：

英文摘要

Modern Vision-Language Models (VLMs) exhibit unprecedented capabilities in cross-modal semantic understanding between visual and textual modalities. Given the intrinsic need for multi-modal integration in clinical applications, VLMs have emerged as a promising solution for a wide range of medical image analysis tasks. However, adapting general-purpose VLMs to medical domain poses numerous challenges, such as large domain gaps, complicated pathological variations, and diversity and uniqueness of different tasks. The central purpose of this review is to systematically summarize recent advances in adapting VLMs for medical image analysis, analyzing current challenges, and recommending promising yet urgent directions for further investigations. We begin by introducing core learning strategies for medical VLMs, including pretraining, fine-tuning, and prompt learning. We then categorize five major VLM adaptation strategies for medical image analysis. These strategies are further analyzed across eleven medical imaging tasks to illustrate their current practical implementations. Furthermore, we analyze key challenges that impede the effective adaptation of VLMs to clinical applications and discuss potential directions for future research. We also provide an open-access repository of related literature to facilitate further research, available at https://github.com/haonenglin/Awesome-VLM-for-MIA. It is anticipated that this article can help researchers who are interested in harnessing VLMs in medical image analysis tasks have a better understanding on their capabilities and limitations, as well as current technical barriers, to promote their innovative, robust, and safe application in clinical practice.

作者：Haoneng Lin、Cheng Xu、Jing Qin

作者单位：

学科分类：医学现状、医学发展医学研究方法

推荐引用：Haoneng Lin,Cheng Xu,Jing Qin.Taming Vision-Language Models for Medical Image Analysis: A Comprehensive Review[EB/OL].(2025-06-23)[2025-07-16].https://arxiv.org/abs/2506.18378.点此复制

Taming Vision-Language Models for Medical Image Analysis: A Comprehensive Review

Taming Vision-Language Models for Medical Image Analysis: A Comprehensive Review

评论