|国家预印本平台
首页|ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization

ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization

ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization

来源:Arxiv_logoArxiv
英文摘要

In recent years, diffusion-based generative models have demonstrated remarkable performance in speech conversion, including Denoising Diffusion Probabilistic Models (DDPM) and others. However, the advantages of these models come at the cost of requiring a large number of sampling steps. This limitation hinders their practical application in real-world scenarios. In this paper, we introduce ReFlow-VC, a novel high-fidelity speech conversion method based on rectified flow. Specifically, ReFlow-VC is an Ordinary Differential Equation (ODE) model that transforms a Gaussian distribution to the true Mel-spectrogram distribution along the most direct path. Furthermore, we propose a modeling approach that optimizes speaker features by utilizing both content and pitch information, allowing speaker features to reflect the properties of the current speech more accurately. Experimental results show that ReFlow-VC performs exceptionally well in small datasets and zero-shot scenarios.

Pengyu Ren、Wenhao Guan、Kaidi Wang、Peijie Chen、Qingyang Hong、Lin Li

计算技术、计算机技术

Pengyu Ren,Wenhao Guan,Kaidi Wang,Peijie Chen,Qingyang Hong,Lin Li.ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization[EB/OL].(2025-06-01)[2025-06-22].https://arxiv.org/abs/2506.01032.点此复制

评论