首页|Benchmarking Music Generation Models and Metrics via Human Preference Studies

Benchmarking Music Generation Models and Metrics via Human Preference Studies

来源：

英文摘要

Recent advancements have brought generated music closer to human-created compositions, yet evaluating these models remains challenging. While human preference is the gold standard for assessing quality, translating these subjective judgments into objective metrics, particularly for text-audio alignment and music quality, has proven difficult. In this work, we generate 6k songs using 12 state-of-the-art models and conduct a survey of 15k pairwise audio comparisons with 2.5k human participants to evaluate the correlation between human preferences and widely used metrics. To the best of our knowledge, this work is the first to rank current state-of-the-art music generation models and metrics based on human preference. To further the field of subjective metric evaluation, we provide open access to our dataset of generated music and human evaluations.

作者：Florian GrÃ¶tschla、Ahmet Solak、Luca A. LanzendÃ¶rfer、Roger Wattenhofer

作者单位：

DOI：10.1109/ICASSP49660.2025.10887745

学科分类：计算技术、计算机技术信息传播、知识传播

推荐引用：Florian GrÃ¶tschla,Ahmet Solak,Luca A. LanzendÃ¶rfer,Roger Wattenhofer.Benchmarking Music Generation Models and Metrics via Human Preference Studies[EB/OL].(2025-06-23)[2025-07-17].https://arxiv.org/abs/2506.19085.点此复制

Benchmarking Music Generation Models and Metrics via Human Preference Studies

Benchmarking Music Generation Models and Metrics via Human Preference Studies

评论