|国家预印本平台
首页|基于组合卷积网络的时域单通道语音增强

基于组合卷积网络的时域单通道语音增强

Monaural Speech Enhancement Using Combined Convolution Neural Network In The Time Domain

中文摘要英文摘要

近期研究表明,卷积网络在时域建模语音序列的长期依赖关系方面有很好的表现。采用多层堆叠扩张卷积技术,可以有效扩大网络的感受野。但是,随着较深层的扩张率的增加,映射到上一层特征点之间的距离会变大,这容易导致对特征点之间近距离信息的忽略。本文提出一种即插即用的逆残差线性瓶颈模块,称为组合卷积模块,旨在提取特征点之间的近距离信息。组合卷积模块的主要部分设计了两个并行卷积块,一个是普通扩张卷积块,另一个是聚合卷积块。后者主要通过池化层对相邻点之间丢失的细节进行汇聚,并与普通扩张卷积的输出相结合,完成信息提取。在TIMIT数据集上的实验结果表明,在主体模块堆叠数量相同的情况下,本文基于TasNet框架下提出的模块与基线Con-TasNet相比之下,SI-SNR增益达到1.04dB。

Recent studies have shown that convolution network (CNN) has a good performance on modeling the long-term dependence of speech sequences in time domain. Multi-layer stacked dilated convolution is used to effectively enlarge the receptive field of network. However, the distance between the feature points mapped to the previous layer will become larger with the increase of dilated rate in higher layer, which easily leads to the neglect of the short-range information between the feature points. This paper proposes a plug-and-play inverted residual and linear bottleneck module called combined convolution (CB-Conv) module, aiming to extract the short-range information between feature points. The main part of CB-Conv module designs two parallel convolution blocks, one is common dilated convolution block, the other is aggregation convolution block. The latter mainly aggregates the lost details between adjacent points through pooling layer, and integrates with the output of the common dilated convolution to complete the information extraction. Experimental results on TIMIT datasets show that the proposed module achieves 1.04dB SI-SNR gain based on TasNet framework compared with the baseline Conv-TasNet under the condition of same number of stacted main module.

余嘉诚、张程、蒋挺

通信

语音增强组合卷积单通道

speech enhancementcombined convolutionmonaural

余嘉诚,张程,蒋挺.基于组合卷积网络的时域单通道语音增强[EB/OL].(2022-04-08)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/202204-126.点此复制

评论