A Training-Free Approach for Music Style Transfer with Latent Diffusion Models
A Training-Free Approach for Music Style Transfer with Latent Diffusion Models
Music style transfer enables personalized music creation by combining the structure of one piece with the stylistic characteristics of another. While recent approaches have explored text-conditioned generation and diffusion-based synthesis, most require extensive training, paired datasets, or detailed textual annotations. In this work, we introduce Stylus, a novel training-free framework for music style transfer that directly manipulates the self-attention layers of a pre-trained Latent Diffusion Model (LDM). Operating in the mel-spectrogram domain, Stylus transfers musical style by replacing key and value representations from the content audio with those of the style reference, without any fine-tuning. To enhance stylization quality and controllability, we further incorporate query preservation, CFG-inspired guidance scaling, multi-style interpolation, and phase-preserving reconstruction. Our method significantly improves perceptual quality and structural preservation compared to prior work, while remaining lightweight and easy to deploy. This work highlights the potential of diffusion-based attention manipulation for efficient, high-fidelity, and interpretable music generation-without training. Codes will be released upon acceptance.
Heehwan Wang、Joonwoo Kwon、Sooyoung Kim、Shinjae Yoo、Yuewei Lin、Jiook Cha
计算技术、计算机技术
Heehwan Wang,Joonwoo Kwon,Sooyoung Kim,Shinjae Yoo,Yuewei Lin,Jiook Cha.A Training-Free Approach for Music Style Transfer with Latent Diffusion Models[EB/OL].(2025-08-13)[2025-08-24].https://arxiv.org/abs/2411.15913.点此复制
评论