首页|ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization

ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization

来源：

英文摘要

In recent years, diffusion-based generative models have demonstrated remarkable performance in speech conversion, including Denoising Diffusion Probabilistic Models (DDPM) and others. However, the advantages of these models come at the cost of requiring a large number of sampling steps. This limitation hinders their practical application in real-world scenarios. In this paper, we introduce ReFlow-VC, a novel high-fidelity speech conversion method based on rectified flow. Specifically, ReFlow-VC is an Ordinary Differential Equation (ODE) model that transforms a Gaussian distribution to the true Mel-spectrogram distribution along the most direct path. Furthermore, we propose a modeling approach that optimizes speaker features by utilizing both content and pitch information, allowing speaker features to reflect the properties of the current speech more accurately. Experimental results show that ReFlow-VC performs exceptionally well in small datasets and zero-shot scenarios.

作者：Pengyu Ren、Wenhao Guan、Kaidi Wang、Peijie Chen、Qingyang Hong、Lin Li

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Pengyu Ren,Wenhao Guan,Kaidi Wang,Peijie Chen,Qingyang Hong,Lin Li.ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization[EB/OL].(2025-06-01)[2025-06-22].https://arxiv.org/abs/2506.01032.点此复制

ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization

ReFlow-VC: Zero-shot Voice Conversion Based on Rectified Flow and Speaker Feature Optimization

评论