首页|SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization

SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization

来源：

英文摘要

Voice anonymization protects speaker privacy by concealing identity while preserving linguistic and paralinguistic content. Self-supervised learning (SSL) representations encode linguistic features but preserve speaker traits. We propose a novel speaker-embedding-free framework called SEF-MK. Instead of using a single k-means model trained on the entire dataset, SEF-MK anonymizes SSL representations for each utterance by randomly selecting one of multiple k-means models, each trained on a different subset of speakers. We explore this approach from both attacker and user perspectives. Extensive experiments show that, compared to a single k-means model, SEF-MK with multiple k-means models better preserves linguistic and emotional content from the user's viewpoint. However, from the attacker's perspective, utilizing multiple k-means models boosts the effectiveness of privacy attacks. These insights can aid users in designing voice anonymization systems to mitigate attacker threats.

作者：Beilong Tang、Xiaoxiao Miao、Xin Wang、Ming Li

作者单位：

学科分类：计算技术、计算机技术自动化技术、自动化技术设备

推荐引用：Beilong Tang,Xiaoxiao Miao,Xin Wang,Ming Li.SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization[EB/OL].(2025-08-15)[2025-08-24].https://arxiv.org/abs/2508.07086.点此复制

SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization

SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization

评论