|国家预印本平台
首页|On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition

On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition

On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition

来源:Arxiv_logoArxiv
英文摘要

This paper proposes a novel MoE-based speaker adaptation framework for foundation models based dysarthric speech recognition. This approach enables zero-shot adaptation and real-time processing while incorporating domain knowledge. Speech impairment severity and gender conditioned adapter experts are dynamically combined using on-the-fly predicted speaker-dependent routing parameters. KL-divergence is used to further enforce diversity among experts and their generalization to unseen speakers. Experimental results on the UASpeech corpus suggest that on-the-fly MoE-based adaptation produces statistically significant WER reductions of up to 1.34% absolute (6.36% relative) over the unadapted baseline HuBERT/WavLM models. Consistent WER reductions of up to 2.55% absolute (11.44% relative) and RTF speedups of up to 7 times are obtained over batch-mode adaptation across varying speaker-level data quantities. The lowest published WER of 16.35% (46.77% on very low intelligibility) is obtained.

Shujie HU、Xurong Xie、Mengzhe Geng、Jiajun Deng、Huimeng Wang、Guinan Li、Chengxi Deng、Tianzi Wang、Mingyu Cui、Helen Meng、Xunying Liu

计算技术、计算机技术

Shujie HU,Xurong Xie,Mengzhe Geng,Jiajun Deng,Huimeng Wang,Guinan Li,Chengxi Deng,Tianzi Wang,Mingyu Cui,Helen Meng,Xunying Liu.On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition[EB/OL].(2025-05-28)[2025-06-25].https://arxiv.org/abs/2505.22072.点此复制

评论