首页|AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation

AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation

来源：

英文摘要

AudioSet is a widely used benchmark in the audio research community and has significantly advanced various audio-related tasks. However, persistent issues with label accuracy and completeness remain critical bottlenecks that limit performance in downstream applications.To address the aforementioned challenges, we propose a three-stage reannotation framework that harnesses general-purpose audio-language foundation models to systematically improve the label quality of AudioSet. The framework employs a cross-modal prompting strategy, inspired by the concept of prompt chaining, wherein prompts are sequentially composed to execute subtasks (audio comprehension, label synthesis, and semantic alignment). Leveraging this framework, we construct a high-quality, structured relabeled version of AudioSet-R. Extensive experiments conducted on representative audio classification models--including AST, PANNs, SSAST, and AudioMAE--consistently demonstrate substantial performance improvements, thereby validating the generalizability and effectiveness of the proposed approach in enhancing label reliability.The code is publicly available at: https://github.com/colaudiolab/AudioSet-R.

作者：Yulin Sun、Qisheng Xu、Yi Su、Qian Zhu、Yong Dou、Xinwang Liu、Kele Xu

作者单位：

DOI：10.1145/3746027.3758260

学科分类：计算技术、计算机技术

推荐引用：Yulin Sun,Qisheng Xu,Yi Su,Qian Zhu,Yong Dou,Xinwang Liu,Kele Xu.AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation[EB/OL].(2025-08-21)[2025-09-02].https://arxiv.org/abs/2508.15429.点此复制

AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation

AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation

评论