|国家预印本平台
首页|EASY: Emotion-aware Speaker Anonymization via Factorized Distillation

EASY: Emotion-aware Speaker Anonymization via Factorized Distillation

EASY: Emotion-aware Speaker Anonymization via Factorized Distillation

来源:Arxiv_logoArxiv
英文摘要

Emotion plays a significant role in speech interaction, conveyed through tone, pitch, and rhythm, enabling the expression of feelings and intentions beyond words to create a more personalized experience. However, most existing speaker anonymization systems employ parallel disentanglement methods, which only separate speech into linguistic content and speaker identity, often neglecting the preservation of the original emotional state. In this study, we introduce EASY, an emotion-aware speaker anonymization framework. EASY employs a novel sequential disentanglement process to disentangle speaker identity, linguistic content, and emotional representation, modeling each speech attribute in distinct subspaces through a factorized distillation approach. By independently constraining speaker identity and emotional representation, EASY minimizes information leakage, enhancing privacy protection while preserving original linguistic content and emotional state. Experimental results on the VoicePrivacy Challenge official datasets demonstrate that our proposed approach outperforms all baseline systems, effectively protecting speaker privacy while maintaining linguistic content and emotional state.

Jixun Yao、Hexin Liu、Eng Siong Chng、Lei Xie

计算技术、计算机技术

Jixun Yao,Hexin Liu,Eng Siong Chng,Lei Xie.EASY: Emotion-aware Speaker Anonymization via Factorized Distillation[EB/OL].(2025-05-20)[2025-06-24].https://arxiv.org/abs/2505.15004.点此复制

评论