Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down
Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down
OpenAI's Whisper has achieved significant success in Automatic Speech Recognition. However, it has consistently been found to exhibit hallucination issues, particularly in non-speech segments, which limits its broader application in complex industrial settings. In this paper, we introduce a novel method to reduce Whisper's hallucination on non-speech segments without using any pre- or post-possessing techniques. Specifically, we benchmark the contribution of each self-attentional head in the Whisper-large-v3 decoder to the hallucination problem by performing a head-wise mask. Our findings reveal that only 3 of the 20 heads account for over 75% of the hallucinations on the UrbanSound dataset. We then fine-tune these three crazy heads using a collection of non-speech data. The results show that our best fine-tuned model, namely Calm-Whisper, achieves over 80% reduction in non-speech hallucination with only less than 0.1% WER degradation on LibriSpeech test-clean and test-other.
Yingzhi Wang、Anas Alhmoud、Saad Alsahly、Muhammad Alqurishi、Mirco Ravanelli
计算技术、计算机技术
Yingzhi Wang,Anas Alhmoud,Saad Alsahly,Muhammad Alqurishi,Mirco Ravanelli.Calm-Whisper: Reduce Whisper Hallucination On Non-Speech By Calming Crazy Heads Down[EB/OL].(2025-05-19)[2025-06-12].https://arxiv.org/abs/2505.12969.点此复制
评论