Cyclic Vision-Language Manipulator: Towards Reliable and Fine-Grained Image Interpretation for Automated Report Generation
Cyclic Vision-Language Manipulator: Towards Reliable and Fine-Grained Image Interpretation for Automated Report Generation
Despite significant advancements in automated report generation, the opaqueness of text interpretability continues to cast doubt on the reliability of the content produced. This paper introduces a novel approach to identify specific image features in X-ray images that influence the outputs of report generation models. Specifically, we propose Cyclic Vision-Language Manipulator CVLM, a module to generate a manipulated X-ray from an original X-ray and its report from a designated report generator. The essence of CVLM is that cycling manipulated X-rays to the report generator produces altered reports aligned with the alterations pre-injected into the reports for X-ray generation, achieving the term "cyclic manipulation". This process allows direct comparison between original and manipulated X-rays, clarifying the critical image features driving changes in reports and enabling model users to assess the reliability of the generated texts. Empirical evaluations demonstrate that CVLM can identify more precise and reliable features compared to existing explanation methods, significantly enhancing the transparency and applicability of AI-generated reports.
Yingying Fang、Zihao Jin、Shaojie Guo、Jinda Liu、Zhiling Yue、Yijian Gao、Junzhi Ning、Zhi Li、Simon Walsh、Guang Yang
医学研究方法计算技术、计算机技术
Yingying Fang,Zihao Jin,Shaojie Guo,Jinda Liu,Zhiling Yue,Yijian Gao,Junzhi Ning,Zhi Li,Simon Walsh,Guang Yang.Cyclic Vision-Language Manipulator: Towards Reliable and Fine-Grained Image Interpretation for Automated Report Generation[EB/OL].(2025-06-18)[2025-06-29].https://arxiv.org/abs/2411.05261.点此复制
评论