|国家预印本平台
首页|Revealing the Hidden Temporal Structure of HubertSoft Embeddings based on the Russian Phonetic Corpus

Revealing the Hidden Temporal Structure of HubertSoft Embeddings based on the Russian Phonetic Corpus

Revealing the Hidden Temporal Structure of HubertSoft Embeddings based on the Russian Phonetic Corpus

来源:Arxiv_logoArxiv
英文摘要

Self-supervised learning (SSL) models such as Wav2Vec 2.0 and HuBERT have shown remarkable success in extracting phonetic information from raw audio without labelled data. While prior work has demonstrated that SSL embeddings encode phonetic features at the frame level, it remains unclear whether these models preserve temporal structure, specifically, whether embeddings at phoneme boundaries reflect the identity and order of adjacent phonemes. This study investigates the extent to which boundary-sensitive embeddings from HubertSoft, a soft-clustering variant of HuBERT, encode phoneme transitions. Using the CORPRES Russian speech corpus, we labelled 20 ms embedding windows with triplets of phonemes corresponding to their start, centre, and end segments. A neural network was trained to predict these positions separately, and multiple evaluation metrics, such as ordered, unordered accuracy and a flexible centre accuracy, were used to assess temporal sensitivity. Results show that embeddings extracted at phoneme boundaries capture both phoneme identity and temporal order, with especially high accuracy at segment boundaries. Confusion patterns further suggest that the model encodes articulatory detail and coarticulatory effects. These findings contribute to our understanding of the internal structure of SSL speech representations and their potential for phonological analysis and fine-grained transcription tasks.

Anastasia Ananeva、Anton Tomilov、Marina Volkova

语言学常用外国语

Anastasia Ananeva,Anton Tomilov,Marina Volkova.Revealing the Hidden Temporal Structure of HubertSoft Embeddings based on the Russian Phonetic Corpus[EB/OL].(2025-07-09)[2025-07-23].https://arxiv.org/abs/2507.06794.点此复制

评论