|国家预印本平台
首页|Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers

来源:Arxiv_logoArxiv
英文摘要

In recent years, the impact of self-supervised speech Transformers has extended to speaker-related applications. However, little research has explored how these models encode speaker information. In this work, we address this gap by identifying neurons in the feed-forward layers that are correlated with speaker information. Specifically, we analyze neurons associated with k-means clusters of self-supervised features and i-vectors. Our analysis reveals that these clusters correspond to broad phonetic and gender classes, making them suitable for identifying neurons that represent speakers. By protecting these neurons during pruning, we can significantly preserve performance on speaker-related task, demonstrating their crucial role in encoding speaker information.

Tzu-Quan Lin、Hsi-Chun Cheng、Hung-yi Lee、Hao Tang

计算技术、计算机技术

Tzu-Quan Lin,Hsi-Chun Cheng,Hung-yi Lee,Hao Tang.Identifying Speaker Information in Feed-Forward Layers of Self-Supervised Speech Transformers[EB/OL].(2025-06-26)[2025-07-09].https://arxiv.org/abs/2506.21712.点此复制

评论