|国家预印本平台
首页|Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts

Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts

Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts

来源:Arxiv_logoArxiv
英文摘要

Emotion recognition and sentiment analysis are pivotal tasks in speech and language processing, particularly in real-world scenarios involving multi-party, conversational data. This paper presents a multimodal approach to tackle these challenges on a well-known dataset. We propose a system that integrates four key modalities/channels using pre-trained models: RoBERTa for text, Wav2Vec2 for speech, a proposed FacialNet for facial expressions, and a CNN+Transformer architecture trained from scratch for video analysis. Feature embeddings from each modality are concatenated to form a multimodal vector, which is then used to predict emotion and sentiment labels. The multimodal system demonstrates superior performance compared to unimodal approaches, achieving an accuracy of 66.36% for emotion recognition and 72.15% for sentiment analysis.

Masoumeh Chapariniya、Sarah Ebling、Teodora Vukovic、Volker Dellwo、Hossein Ranjbar、Aref Farhadipour

语言学

Masoumeh Chapariniya,Sarah Ebling,Teodora Vukovic,Volker Dellwo,Hossein Ranjbar,Aref Farhadipour.Multimodal Emotion Recognition and Sentiment Analysis in Multi-Party Conversation Contexts[EB/OL].(2025-03-09)[2025-06-21].https://arxiv.org/abs/2503.06805.点此复制

评论