首页|Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

来源：

英文摘要

Recently, a variational autoencoder (VAE)-based single-channel speech enhancement system using Bayesian permutation training has been proposed, which uses two pretrained VAEs to obtain latent representations for speech and noise. Based on these pretrained VAEs, a noisy VAE learns to generate speech and noise latent representations from noisy speech for speech enhancement. Modifying the pretrained VAE loss terms affects the pretrained speech and noise latent representations. In this paper, we investigate how these different representations affect speech enhancement performance. Experiments on the DNS3, WSJ0-QUT, and VoiceBank-DEMAND datasets show that a latent space where speech and noise representations are clearly separated significantly improves performance over standard VAEs, which produce overlapping speech and noise representations.

作者：Jiatong Li、Simon Doclo

作者单位：

学科分类：通信无线通信

推荐引用：Jiatong Li,Simon Doclo.Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement[EB/OL].(2025-08-07)[2025-08-18].https://arxiv.org/abs/2508.05293.点此复制

Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

评论