|国家预印本平台
首页|Can Masked Autoencoders Also Listen to Birds?

Can Masked Autoencoders Also Listen to Birds?

Can Masked Autoencoders Also Listen to Birds?

来源:Arxiv_logoArxiv
英文摘要

Masked Autoencoders (MAEs) have shown competitive results in audio classification by learning rich semantic representations through an efficient self-supervised reconstruction task. However, general-purpose models fail to generalize well when applied directly to fine-grained audio domains. Specifically, bird-sound classification requires distinguishing subtle inter-species differences and managing high intra-species acoustic variability, thereby revealing the performance limitations of general-domain Audio-MAE models. This work demonstrates that bridging this domain gap requires more than domain-specific pretraining data; adapting the entire training pipeline is crucial. We systematically revisit and adapt the pretraining recipe, fine-tuning methods, and frozen feature utilization to bird sounds using BirdSet, a large-scale bioacoustic dataset comparable to AudioSet. Our resulting Bird-MAE achieves new state-of-the-art results in BirdSet's multi-label classification benchmark. Additionally, we introduce the parameter-efficient prototypical probing, enhancing the utility of frozen MAE representations and closely approaching fine-tuning performance in low-resource settings. Bird-MAE's prototypical probes outperform linear probing by up to 37%$_\text{p}$ in MAP and narrow the gap to fine-tuning to approximately 3.3%$_\text{p}$ on average across BirdSet downstream tasks. Bird-MAE also demonstrates robust few-shot capabilities with prototypical probing in our newly established few-shot benchmark on BirdSet, highlighting the potential of tailored self-supervised learning pipelines for fine-grained audio domains.

Lukas Rauch、René Heinrich、Ilyass Moummad、Alexis Joly、Bernhard Sick、Christoph Scholz

生物科学研究方法、生物科学研究技术

Lukas Rauch,René Heinrich,Ilyass Moummad,Alexis Joly,Bernhard Sick,Christoph Scholz.Can Masked Autoencoders Also Listen to Birds?[EB/OL].(2025-04-17)[2025-07-16].https://arxiv.org/abs/2504.12880.点此复制

评论