Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
Pushing the Limits of Beam Search Decoding for Transducer-based ASR models
Transducer models have emerged as a promising choice for end-to-end ASR systems, offering a balanced trade-off between recognition accuracy, streaming capabilities, and inference speed in greedy decoding. However, beam search significantly slows down Transducers due to repeated evaluations of key network components, limiting practical applications. This paper introduces a universal method to accelerate beam search for Transducers, enabling the implementation of two optimized algorithms: ALSD++ and AES++. The proposed method utilizes batch operations, a tree-based hypothesis structure, novel blank scoring for enhanced shallow fusion, and CUDA graph execution for efficient GPU inference. This narrows the speed gap between beam and greedy modes to only 10-20% for the whole system, achieves 14-30% relative improvement in WER compared to greedy decoding, and improves shallow fusion for low-resource up to 11% compared to existing implementations. All the algorithms are open sourced.
Lilit Grigoryan、Vladimir Bataev、Andrei Andrusenko、Hainan Xu、Vitaly Lavrukhin、Boris Ginsburg
计算技术、计算机技术
Lilit Grigoryan,Vladimir Bataev,Andrei Andrusenko,Hainan Xu,Vitaly Lavrukhin,Boris Ginsburg.Pushing the Limits of Beam Search Decoding for Transducer-based ASR models[EB/OL].(2025-05-30)[2025-07-01].https://arxiv.org/abs/2506.00185.点此复制
评论