|国家预印本平台
首页|Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots

Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots

Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots

来源:Arxiv_logoArxiv
英文摘要

This paper introduces Binaural Sound Event Localization and Detection (BiSELD), a task that aims to jointly detect and localize multiple sound events using binaural audio, inspired by the spatial hearing mechanism of humans. To support this task, we present a synthetic benchmark dataset, called the Binaural Set, which simulates realistic auditory scenes using measured head-related transfer functions (HRTFs) and diverse sound events. To effectively address the BiSELD task, we propose a new input feature representation called the Binaural Time-Frequency Feature (BTFF), which encodes interaural time difference (ITD), interaural level difference (ILD), and high-frequency spectral cues (SC) from binaural signals. BTFF is composed of eight channels, including left and right mel-spectrograms, velocity-maps, SC-maps, and ITD-/ILD-maps, designed to cover different spatial cues across frequency bands and spatial axes. A CRNN-based model, BiSELDnet, is then developed to learn both spectro-temporal patterns and HRTF-based localization cues from BTFF. Experiments on the Binaural Set show that each BTFF sub-feature enhances task performance: V-map improves detection, ITD-/ILD-maps enable accurate horizontal localization, and SC-map captures vertical spatial cues. The final system achieves a SELD error of 0.110 with 87.1% F-score and 4.4° localization error, demonstrating the effectiveness of the proposed framework in mimicking human-like auditory perception.

Gyeong-Tae Lee、Hyeonuk Nam、Yong-Hwa Park

声学工程

Gyeong-Tae Lee,Hyeonuk Nam,Yong-Hwa Park.Binaural Sound Event Localization and Detection based on HRTF Cues for Humanoid Robots[EB/OL].(2025-07-28)[2025-08-10].https://arxiv.org/abs/2507.20530.点此复制

评论