首页|Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning

Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning

来源：

英文摘要

This paper introduces Meta-PerSER, a novel meta-learning framework that personalizes Speech Emotion Recognition (SER) by adapting to each listener's unique way of interpreting emotion. Conventional SER systems rely on aggregated annotations, which often overlook individual subtleties and lead to inconsistent predictions. In contrast, Meta-PerSER leverages a Model-Agnostic Meta-Learning (MAML) approach enhanced with Combined-Set Meta-Training, Derivative Annealing, and per-layer per-step learning rates, enabling rapid adaptation with only a few labeled examples. By integrating robust representations from pre-trained self-supervised models, our framework first captures general emotional cues and then fine-tunes itself to personal annotation styles. Experiments on the IEMOCAP corpus demonstrate that Meta-PerSER significantly outperforms baseline methods in both seen and unseen data scenarios, highlighting its promise for personalized emotion recognition.

作者：Liang-Yeh Shen、Shi-Xin Fang、Yi-Cheng Lin、Huang-Cheng Chou、Hung-yi Lee

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Liang-Yeh Shen,Shi-Xin Fang,Yi-Cheng Lin,Huang-Cheng Chou,Hung-yi Lee.Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning[EB/OL].(2025-05-22)[2025-06-07].https://arxiv.org/abs/2505.16220.点此复制

Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning

Meta-PerSER: Few-Shot Listener Personalized Speech Emotion Recognition via Meta-learning

评论