Multi-MLLM Knowledge Distillation for Out-of-Context News Detection
Multi-MLLM Knowledge Distillation for Out-of-Context News Detection
Multimodal out-of-context news is a type of misinformation in which the image is used outside of its original context. Many existing works have leveraged multimodal large language models (MLLMs) for detecting out-of-context news. However, observing the limited zero-shot performance of smaller MLLMs, they generally require label-rich fine-tuning and/or expensive API calls to GPT models to improve the performance, which is impractical in low-resource scenarios. In contrast, we aim to improve the performance of small MLLMs in a more label-efficient and cost-effective manner. To this end, we first prompt multiple teacher MLLMs to generate both label predictions and corresponding rationales, which collectively serve as the teachers' knowledge. We then introduce a two-stage knowledge distillation framework to transfer this knowledge to a student MLLM. In Stage 1, we apply LoRA fine-tuning to the student model using all training data. In Stage 2, we further fine-tune the student model using both LoRA fine-tuning and DPO on the data points where teachers' predictions conflict. This two-stage strategy reduces annotation costs and helps the student model uncover subtle patterns in more challenging cases. Experimental results demonstrate that our approach achieves state-of-the-art performance using less than 10% labeled data.
Yimeng Gu、Zhao Tong、Ignacio Castro、Shu Wu、Gareth Tyson
计算技术、计算机技术
Yimeng Gu,Zhao Tong,Ignacio Castro,Shu Wu,Gareth Tyson.Multi-MLLM Knowledge Distillation for Out-of-Context News Detection[EB/OL].(2025-05-28)[2025-06-12].https://arxiv.org/abs/2505.22517.点此复制
评论