|国家预印本平台
首页|Analysis of ABC Frontend Audio Systems for the NIST-SRE24

Analysis of ABC Frontend Audio Systems for the NIST-SRE24

Analysis of ABC Frontend Audio Systems for the NIST-SRE24

来源:Arxiv_logoArxiv
英文摘要

We present a comprehensive analysis of the embedding extractors (frontends) developed by the ABC team for the audio track of NIST SRE 2024. We follow the two scenarios imposed by NIST: using only a provided set of telephone recordings for training (fixed) or adding publicly available data (open condition). Under these constraints, we develop the best possible speaker embedding extractors for the pre-dominant conversational telephone speech (CTS) domain. We explored architectures based on ResNet with different pooling mechanisms, recently introduced ReDimNet architecture, as well as a system based on the XLS-R model, which represents the family of large pre-trained self-supervised models. In open condition, we train on VoxBlink2 dataset, containing 110 thousand speakers across multiple languages. We observed a good performance and robustness of VoxBlink-trained models, and our experiments show practical recipes for developing state-of-the-art frontends for speaker recognition.

Sara Barahona、Anna Silnova、Ladislav Mo?ner、Junyi Peng、Old?ich Plchot、Johan Rohdin、Lin Zhang、Jiangyu Han、Petr Palka、Federico Landini、Luká? Burget、Themos Stafylakis、Sandro Cumani、Dominik Bobo?、Miroslav Hlava?ek、Martin Kodovsky、Tomá? Pavlí?ek

通信无线通信

Sara Barahona,Anna Silnova,Ladislav Mo?ner,Junyi Peng,Old?ich Plchot,Johan Rohdin,Lin Zhang,Jiangyu Han,Petr Palka,Federico Landini,Luká? Burget,Themos Stafylakis,Sandro Cumani,Dominik Bobo?,Miroslav Hlava?ek,Martin Kodovsky,Tomá? Pavlí?ek.Analysis of ABC Frontend Audio Systems for the NIST-SRE24[EB/OL].(2025-05-21)[2025-06-06].https://arxiv.org/abs/2505.15320.点此复制

评论