Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning
Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning
Large-scale multilingual ASR models like Whisper excel in high-resource settings but face challenges in low-resource scenarios, such as rare languages and code-switching (CS), due to computational costs and catastrophic forgetting. We explore Soft Prompt Tuning (SPT), a parameter-efficient method to enhance CS ASR while preserving prior knowledge. We evaluate two strategies: (1) full fine-tuning (FFT) of both soft prompts and the entire Whisper model, demonstrating improved cross-lingual capabilities compared to traditional methods, and (2) adhering to SPT's original design by freezing model parameters and only training soft prompts. Additionally, we introduce SPT4ASR, a combination of different SPT variants. Experiments on the SEAME and ASRU2019 datasets show that deep prompt tuning is the most effective SPT approach, and our SPT4ASR methods achieve further error reductions in CS ASR, maintaining parameter efficiency similar to LoRA, without degrading performance on existing languages.
Hongli Yang、Yizhou Peng、Hao Huang、Sheng Li
计算技术、计算机技术
Hongli Yang,Yizhou Peng,Hao Huang,Sheng Li.Adapting Whisper for Parameter-efficient Code-Switching Speech Recognition via Soft Prompt Tuning[EB/OL].(2025-06-16)[2025-07-25].https://arxiv.org/abs/2506.21576.点此复制
评论