|国家预印本平台
首页|Diffusion-based Symbolic Music Generation with Structured State Space Models

Diffusion-based Symbolic Music Generation with Structured State Space Models

Diffusion-based Symbolic Music Generation with Structured State Space Models

来源:Arxiv_logoArxiv
英文摘要

Recent advancements in diffusion models have significantly improved symbolic music generation. However, most approaches rely on transformer-based architectures with self-attention mechanisms, which are constrained by quadratic computational complexity, limiting scalability for long sequences. To address this, we propose Symbolic Music Diffusion with Mamba (SMDIM), a novel diffusion-based architecture integrating Structured State Space Models (SSMs) for efficient global context modeling and the Mamba-FeedForward-Attention Block (MFA) for precise local detail preservation. The MFA Block combines the linear complexity of Mamba layers, the non-linear refinement of FeedForward layers, and the fine-grained precision of self-attention mechanisms, achieving a balance between scalability and musical expressiveness. SMDIM achieves near-linear complexity, making it highly efficient for long-sequence tasks. Evaluated on diverse datasets, including FolkDB, a collection of traditional Chinese folk music that represents an underexplored domain in symbolic music generation, SMDIM outperforms state-of-the-art models in both generation quality and computational efficiency. Beyond symbolic music, SMDIM's architectural design demonstrates adaptability to a broad range of long-sequence generation tasks, offering a scalable and efficient solution for coherent sequence modeling.

Shenghua Yuan、Xing Tang、Jiatao Chen、Tianming Xie、Jing Wang、Bing Shi

计算技术、计算机技术

Shenghua Yuan,Xing Tang,Jiatao Chen,Tianming Xie,Jing Wang,Bing Shi.Diffusion-based Symbolic Music Generation with Structured State Space Models[EB/OL].(2025-07-27)[2025-08-10].https://arxiv.org/abs/2507.20128.点此复制

评论