|国家预印本平台
首页|Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

来源:Arxiv_logoArxiv
英文摘要

Recently, a variational autoencoder (VAE)-based single-channel speech enhancement system using Bayesian permutation training has been proposed, which uses two pretrained VAEs to obtain latent representations for speech and noise. Based on these pretrained VAEs, a noisy VAE learns to generate speech and noise latent representations from noisy speech for speech enhancement. Modifying the pretrained VAE loss terms affects the pretrained speech and noise latent representations. In this paper, we investigate how these different representations affect speech enhancement performance. Experiments on the DNS3, WSJ0-QUT, and VoiceBank-DEMAND datasets show that a latent space where speech and noise representations are clearly separated significantly improves performance over standard VAEs, which produce overlapping speech and noise representations.

Jiatong Li、Simon Doclo

通信无线通信

Jiatong Li,Simon Doclo.Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement[EB/OL].(2025-08-07)[2025-08-18].https://arxiv.org/abs/2508.05293.点此复制

评论