首页|Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

来源：

英文摘要

We present Task 5 of the DCASE 2025 Challenge: an Audio Question Answering (AQA) benchmark spanning multiple domains of sound understanding. This task defines three QA subsets (Bioacoustics, Temporal Soundscapes, and Complex QA) to test audio-language models on interactive question-answering over diverse acoustic scenes. We describe the dataset composition (from marine mammal calls to soundscapes and complex real-world clips), the evaluation protocol (top-1 accuracy with answer-shuffling robustness), and baseline systems (Qwen2-Audio-7B, AudioFlamingo 2, Gemini-2-Flash). Preliminary results on the development set are compared, showing strong variation across models and subsets. This challenge aims to advance the audio understanding and reasoning capabilities of audio-language models toward human-level acuity, which are crucial for enabling AI agents to perceive and interact about the world effectively.

作者：Chao-Han Huck Yang、Sreyan Ghosh、Qing Wang、Jaeyeon Kim、Hengyi Hong、Sonal Kumar、Guirui Zhong、Zhifeng Kong、S Sakshi、Vaibhavi Lokegaonkar、Oriol Nieto、Ramani Duraiswami、Dinesh Manocha、Gunhee Kim、Jun Du、Rafael Valle、Bryan Catanzaro

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Chao-Han Huck Yang,Sreyan Ghosh,Qing Wang,Jaeyeon Kim,Hengyi Hong,Sonal Kumar,Guirui Zhong,Zhifeng Kong,S Sakshi,Vaibhavi Lokegaonkar,Oriol Nieto,Ramani Duraiswami,Dinesh Manocha,Gunhee Kim,Jun Du,Rafael Valle,Bryan Catanzaro.Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge[EB/OL].(2025-05-12)[2025-06-05].https://arxiv.org/abs/2505.07365.点此复制

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge

评论