|国家预印本平台
首页|A Training-Free Approach for Music Style Transfer with Latent Diffusion Models

A Training-Free Approach for Music Style Transfer with Latent Diffusion Models

A Training-Free Approach for Music Style Transfer with Latent Diffusion Models

来源:Arxiv_logoArxiv
英文摘要

Music style transfer enables personalized music creation by combining the structure of one piece with the stylistic characteristics of another. While recent approaches have explored text-conditioned generation and diffusion-based synthesis, most require extensive training, paired datasets, or detailed textual annotations. In this work, we introduce Stylus, a novel training-free framework for music style transfer that directly manipulates the self-attention layers of a pre-trained Latent Diffusion Model (LDM). Operating in the mel-spectrogram domain, Stylus transfers musical style by replacing key and value representations from the content audio with those of the style reference, without any fine-tuning. To enhance stylization quality and controllability, we further incorporate query preservation, CFG-inspired guidance scaling, multi-style interpolation, and phase-preserving reconstruction. Our method significantly improves perceptual quality and structural preservation compared to prior work, while remaining lightweight and easy to deploy. This work highlights the potential of diffusion-based attention manipulation for efficient, high-fidelity, and interpretable music generation-without training. Codes will be released upon acceptance.

Heehwan Wang、Joonwoo Kwon、Sooyoung Kim、Shinjae Yoo、Yuewei Lin、Jiook Cha

计算技术、计算机技术

Heehwan Wang,Joonwoo Kwon,Sooyoung Kim,Shinjae Yoo,Yuewei Lin,Jiook Cha.A Training-Free Approach for Music Style Transfer with Latent Diffusion Models[EB/OL].(2025-08-13)[2025-08-24].https://arxiv.org/abs/2411.15913.点此复制

评论