首页|Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge

Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge

来源：

英文摘要

This paper presents our system for the MISP-Meeting Challenge Track 2. The primary difficulty lies in the dataset, which contains strong background noise, reverberation, overlapping speech, and diverse meeting topics. To address these issues, we (a) designed G-SpatialNet, a speech enhancement (SE) model to improve Guided Source Separation (GSS) signals; (b) proposed TLS, a framework comprising time alignment, level alignment, and signal-to-noise ratio filtering, to generate signal-level pseudo labels for real-recorded far-field audio data, thereby facilitating SE models' training; and (c) explored fine-tuning strategies, data augmentation, and multimodal information to enhance the performance of pre-trained Automatic Speech Recognition (ASR) models in meeting scenarios. Finally, our system achieved character error rates (CERs) of 5.44% and 9.52% on the Dev and Eval sets, respectively, with relative improvements of 64.8% and 52.6% over the baseline, securing second place.

作者：Longjie Luo、Shenghui Lu、Lin Li、Qingyang Hong

作者单位：

学科分类：电子技术概论通信

推荐引用：Longjie Luo,Shenghui Lu,Lin Li,Qingyang Hong.Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge[EB/OL].(2025-06-23)[2025-07-21].https://arxiv.org/abs/2505.24446.点此复制

Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge

Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge

评论