首页|On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition

On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition

来源：

Arxiv

英文摘要

This paper proposes a novel MoE-based speaker adaptation framework for foundation models based dysarthric speech recognition. This approach enables zero-shot adaptation and real-time processing while incorporating domain knowledge. Speech impairment severity and gender conditioned adapter experts are dynamically combined using on-the-fly predicted speaker-dependent routing parameters. KL-divergence is used to further enforce diversity among experts and their generalization to unseen speakers. Experimental results on the UASpeech corpus suggest that on-the-fly MoE-based adaptation produces statistically significant WER reductions of up to 1.34% absolute (6.36% relative) over the unadapted baseline HuBERT/WavLM models. Consistent WER reductions of up to 2.55% absolute (11.44% relative) and RTF speedups of up to 7 times are obtained over batch-mode adaptation across varying speaker-level data quantities. The lowest published WER of 16.35% (46.77% on very low intelligibility) is obtained.

作者：Shujie HU、Xurong Xie、Mengzhe Geng、Jiajun Deng、Huimeng Wang、Guinan Li、Chengxi Deng、Tianzi Wang、Mingyu Cui、Helen Meng、Xunying Liu

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Shujie HU,Xurong Xie,Mengzhe Geng,Jiajun Deng,Huimeng Wang,Guinan Li,Chengxi Deng,Tianzi Wang,Mingyu Cui,Helen Meng,Xunying Liu.On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition[EB/OL].(2025-05-28)[2025-06-25].https://arxiv.org/abs/2505.22072.点此复制

On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition

On-the-fly Routing for Zero-shot MoE Speaker Adaptation of Speech Foundation Models for Dysarthric Speech Recognition

评论