|国家预印本平台
首页|Triadic Multi-party Voice Activity Projection for Turn-taking in Spoken Dialogue Systems

Triadic Multi-party Voice Activity Projection for Turn-taking in Spoken Dialogue Systems

Triadic Multi-party Voice Activity Projection for Turn-taking in Spoken Dialogue Systems

来源:Arxiv_logoArxiv
英文摘要

Turn-taking is a fundamental component of spoken dialogue, however conventional studies mostly involve dyadic settings. This work focuses on applying voice activity projection (VAP) to predict upcoming turn-taking in triadic multi-party scenarios. The goal of VAP models is to predict the future voice activity for each speaker utilizing only acoustic data. This is the first study to extend VAP into triadic conversation. We trained multiple models on a Japanese triadic dataset where participants discussed a variety of topics. We found that the VAP trained on triadic conversation outperformed the baseline for all models but that the type of conversation affected the accuracy. This study establishes that VAP can be used for turn-taking in triadic dialogue scenarios. Future work will incorporate this triadic VAP turn-taking model into spoken dialogue systems.

Mikey Elmers、Koji Inoue、Divesh Lala、Tatsuya Kawahara

语言学

Mikey Elmers,Koji Inoue,Divesh Lala,Tatsuya Kawahara.Triadic Multi-party Voice Activity Projection for Turn-taking in Spoken Dialogue Systems[EB/OL].(2025-07-10)[2025-07-25].https://arxiv.org/abs/2507.07518.点此复制

评论