首页|Robust Speech Recognition with Schr\"odinger Bridge-Based Speech Enhancement

Robust Speech Recognition with Schr\"odinger Bridge-Based Speech Enhancement

来源：

英文摘要

In this work, we investigate application of generative speech enhancement to improve the robustness of ASR models in noisy and reverberant conditions. We employ a recently-proposed speech enhancement model based on Schr\"odinger bridge, which has been shown to perform well compared to diffusion-based approaches. We analyze the impact of model scaling and different sampling methods on the ASR performance. Furthermore, we compare the considered model with predictive and diffusion-based baselines and analyze the speech recognition performance when using different pre-trained ASR models. The proposed approach significantly reduces the word error rate, reducing it by approximately 40% relative to the unprocessed speech signals and by approximately 8% relative to a similarly sized predictive approach.

作者：Rauf Nasretdinov、Roman Korostik、Ante Juki?

作者单位：

DOI：10.1109/ICASSP49660.2025.10890638

学科分类：通信

推荐引用：Rauf Nasretdinov,Roman Korostik,Ante Juki?.Robust Speech Recognition with Schr\"odinger Bridge-Based Speech Enhancement[EB/OL].(2025-05-07)[2025-07-17].https://arxiv.org/abs/2505.04237.点此复制

Robust Speech Recognition with Schr\"odinger Bridge-Based Speech Enhancement

Robust Speech Recognition with Schr\"odinger Bridge-Based Speech Enhancement

评论