|国家预印本平台
首页|PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition

PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition

PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition

来源:Arxiv_logoArxiv
英文摘要

The emergence of Mamba as an alternative to attention-based architectures has led to the development of Mamba-based self-supervised learning (SSL) pre-trained models (PTMs) for speech and audio processing. Recent studies suggest that these models achieve comparable or superior performance to state-of-the-art (SOTA) attention-based PTMs for speech emotion recognition (SER). Motivated by prior work demonstrating the benefits of PTM fusion across different speech processing tasks, we hypothesize that leveraging the complementary strengths of Mamba-based and attention-based PTMs will enhance SER performance beyond the fusion of homogenous attention-based PTMs. To this end, we introduce a novel framework, PARROT that integrates parallel branch fusion with Optimal Transport and Hadamard Product. Our approach achieves SOTA results against individual PTMs, homogeneous PTMs fusion, and baseline fusion techniques, thus, highlighting the potential of heterogeneous PTM fusion for SER.

Orchid Chetia Phukan、Mohd Mujtaba Akhtar、Girish、Swarup Ranjan Behera、Jaya Sai Kiran Patibandla、Arun Balaji Buduru、Rajesh Sharma

计算技术、计算机技术

Orchid Chetia Phukan,Mohd Mujtaba Akhtar,Girish,Swarup Ranjan Behera,Jaya Sai Kiran Patibandla,Arun Balaji Buduru,Rajesh Sharma.PARROT: Synergizing Mamba and Attention-based SSL Pre-Trained Models via Parallel Branch Hadamard Optimal Transport for Speech Emotion Recognition[EB/OL].(2025-06-01)[2025-06-19].https://arxiv.org/abs/2506.01138.点此复制

评论