A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
This paper proposes a model that integrates sub-band processing and deep filtering to fully exploit information from the target time-frequency (TF) bin and its surrounding TF bins for single-channel speech enhancement. The sub-band module captures surrounding frequency bin information at the input, while the deep filtering module applies filtering at the output to both the target TF bin and its surrounding TF bins. To further improve the model performance, we decouple deep filtering into temporal and frequency components and introduce a two-stage framework, reducing the complexity of filter coefficient prediction at each stage. Additionally, we propose the TAConv module to strengthen convolutional feature extraction. Experimental results demonstrate that the proposed hierarchical deep filtering network (HDF-Net) effectively utilizes surrounding TF bin information and outperforms other advanced systems while using fewer resources.
Shenghui Lu、Hukai Huang、Jinanglong Yao、Kaidi Wang、Qingyang Hong、Lin Li
通信无线通信
Shenghui Lu,Hukai Huang,Jinanglong Yao,Kaidi Wang,Qingyang Hong,Lin Li.A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement[EB/OL].(2025-06-01)[2025-06-23].https://arxiv.org/abs/2506.01023.点此复制
评论