Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning
Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning
Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers a solution for AI-driven histopathology reporting, setting a strong foundation for future advancements in multimodal clinical applications.
Shih-Wen Liu、Hsuan-Yu Fan、Wei-Ta Chu、Fu-En Yang、Yu-Chiang Frank Wang
医学研究方法计算技术、计算机技术
Shih-Wen Liu,Hsuan-Yu Fan,Wei-Ta Chu,Fu-En Yang,Yu-Chiang Frank Wang.Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning[EB/OL].(2025-06-21)[2025-07-16].https://arxiv.org/abs/2506.17645.点此复制
评论