Enhancing Stereo Sound Event Detection with BiMamba and Pretrained PSELDnet
Enhancing Stereo Sound Event Detection with BiMamba and Pretrained PSELDnet
Pre-training methods have greatly improved the performance of sound event localization and detection (SELD). However, existing Transformer-based models still face high computational cost. To solve this problem, we present a stereo SELD system using a pre-trained PSELDnet and a bidirectional Mamba sequence model. Specifically, we replace the Conformer module with a BiMamba module. We also use asymmetric convolutions to better capture the time and frequency relationships in the audio signal. Test results on the DCASE2025 Task 3 development dataset show that our method performs better than both the baseline and the original PSELDnet with a Conformer decoder. In addition, the proposed model costs fewer computing resources than the baselines. These results show that the BiMamba architecture is effective for solving key challenges in SELD tasks. The source code is publicly accessible at https://github.com/ alexandergwm/DCASE2025 TASK3 Stereo PSELD Mamba.
Wenmiao Gao、Han Yin
计算技术、计算机技术
Wenmiao Gao,Han Yin.Enhancing Stereo Sound Event Detection with BiMamba and Pretrained PSELDnet[EB/OL].(2025-07-13)[2025-07-24].https://arxiv.org/abs/2507.09570.点此复制
评论