Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models
Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models
While full-duplex speech agents promise natural, low-latency human--machine interaction by concurrently processing input and output speech, overlap management remains under-evaluated. We introduce Full-Duplex-Bench v1.5, a modular, fully automated benchmark that simulates four overlap scenarios: user interruption, listener backchannel, side conversation, and ambient speech. Our framework supports both open-sourced and commercial models, offering a comprehensive, extensible metric suite -- categorical dialogue behaviors, stop and response latency, prosodic adaptation, and perceived speech quality -- that can be tailored to application-specific criteria. Benchmarking five state-of-the-art agents reveals two principal strategies: repair-first rapid yielding versus continuity-first sustained flow, and highlights scenario-dependent performance trends. The open-sourced design enables seamless extension with new audio assets, languages, and deployment contexts, empowering practitioners to customize and accelerate the evaluation of robust full-duplex speech systems.
Guan-Ting Lin、Shih-Yun Shan Kuan、Qirui Wang、Jiachen Lian、Tingle Li、Hung-yi Lee
通信无线通信计算技术、计算机技术
Guan-Ting Lin,Shih-Yun Shan Kuan,Qirui Wang,Jiachen Lian,Tingle Li,Hung-yi Lee.Full-Duplex-Bench v1.5: Evaluating Overlap Handling for Full-Duplex Speech Models[EB/OL].(2025-07-30)[2025-08-07].https://arxiv.org/abs/2507.23159.点此复制
评论