|国家预印本平台
首页|Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs

Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs

Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs

来源:Arxiv_logoArxiv
英文摘要

In this paper, we study the challenges of detecting events on social media, where traditional unimodal systems struggle due to the rapid and multimodal nature of data dissemination. We employ a range of models, including unimodal ModernBERT and ConvNeXt-V2, multimodal fusion techniques, and advanced generative models like GPT-4o, and LLaVA. Additionally, we also study the effect of providing multimodal generative models (such as GPT-4o) with a single modality to assess their efficacy. Our results indicate that while multimodal approaches notably outperform unimodal counterparts, generative approaches despite having a large number of parameters, lag behind supervised methods in precision. Furthermore, we also found that they lag behind instruction-tuned models because of their inability to generate event classes correctly. During our error analysis, we discovered that common social media issues such as leet speak, text elongation, etc. are effectively handled by generative approaches but are hard to tackle using supervised approaches.

Abhishek Dey、Aabha Bothera、Samhita Sarikonda、Rishav Aryan、Sanjay Kumar Podishetty、Akshay Havalgi、Gaurav Singh、Saurabh Srivastava

计算技术、计算机技术

Abhishek Dey,Aabha Bothera,Samhita Sarikonda,Rishav Aryan,Sanjay Kumar Podishetty,Akshay Havalgi,Gaurav Singh,Saurabh Srivastava.Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs[EB/OL].(2025-05-16)[2025-07-09].https://arxiv.org/abs/2505.10836.点此复制

评论