|国家预印本平台
首页|MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners

MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners

MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners

来源:Arxiv_logoArxiv
英文摘要

We propose MuseControlLite, a lightweight mechanism designed to fine-tune text-to-music generation models for precise conditioning using various time-varying musical attributes and reference audio signals. The key finding is that positional embeddings, which have been seldom used by text-to-music generation models in the conditioner for text conditions, are critical when the condition of interest is a function of time. Using melody control as an example, our experiments show that simply adding rotary positional embeddings to the decoupled cross-attention layers increases control accuracy from 56.6% to 61.1%, while requiring 6.75 times fewer trainable parameters than state-of-the-art fine-tuning mechanisms, using the same pre-trained diffusion Transformer model of Stable Audio Open. We evaluate various forms of musical attribute control, audio inpainting, and audio outpainting, demonstrating improved controllability over MusicGen-Large and Stable Audio Open ControlNet at a significantly lower fine-tuning cost, with only 85M trainble parameters. Source code, model checkpoints, and demo examples are available at: https://musecontrollite.github.io/web/.

Yi-Hsuan Yang、Shih-Lun Wu、Weijaw Lee、Sheng-Ping Yang、Bo-Rui Chen、Hao-Chung Cheng、Fang-Duo Tsai

计算技术、计算机技术

Yi-Hsuan Yang,Shih-Lun Wu,Weijaw Lee,Sheng-Ping Yang,Bo-Rui Chen,Hao-Chung Cheng,Fang-Duo Tsai.MuseControlLite: Multifunctional Music Generation with Lightweight Conditioners[EB/OL].(2025-06-24)[2025-07-16].https://arxiv.org/abs/2506.18729.点此复制

评论