首页|Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs

Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs

来源：

英文摘要

In this paper, we study the challenges of detecting events on social media, where traditional unimodal systems struggle due to the rapid and multimodal nature of data dissemination. We employ a range of models, including unimodal ModernBERT and ConvNeXt-V2, multimodal fusion techniques, and advanced generative models like GPT-4o, and LLaVA. Additionally, we also study the effect of providing multimodal generative models (such as GPT-4o) with a single modality to assess their efficacy. Our results indicate that while multimodal approaches notably outperform unimodal counterparts, generative approaches despite having a large number of parameters, lag behind supervised methods in precision. Furthermore, we also found that they lag behind instruction-tuned models because of their inability to generate event classes correctly. During our error analysis, we discovered that common social media issues such as leet speak, text elongation, etc. are effectively handled by generative approaches but are hard to tackle using supervised approaches.

作者：Abhishek Dey、Aabha Bothera、Samhita Sarikonda、Rishav Aryan、Sanjay Kumar Podishetty、Akshay Havalgi、Gaurav Singh、Saurabh Srivastava

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Abhishek Dey,Aabha Bothera,Samhita Sarikonda,Rishav Aryan,Sanjay Kumar Podishetty,Akshay Havalgi,Gaurav Singh,Saurabh Srivastava.Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs[EB/OL].(2025-05-16)[2025-07-09].https://arxiv.org/abs/2505.10836.点此复制

Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs

Multimodal Event Detection: Current Approaches and Defining the New Playground through LLMs and VLMs

评论