MOVER: Combining Multiple Meeting Recognition Systems
MOVER: Combining Multiple Meeting Recognition Systems
In this paper, we propose Meeting recognizer Output Voting Error Reduction (MOVER), a novel system combination method for meeting recognition tasks. Although there are methods to combine the output of diarization (e.g., DOVER) or automatic speech recognition (ASR) systems (e.g., ROVER), MOVER is the first approach that can combine the outputs of meeting recognition systems that differ in terms of both diarization and ASR. MOVER combines hypotheses with different time intervals and speaker labels through a five-stage process that includes speaker alignment, segment grouping, word and timing combination, etc. Experimental results on the CHiME-8 DASR task and the multi-channel track of the NOTSOFAR-1 task demonstrate that MOVER can successfully combine multiple meeting recognition systems with diverse diarization and recognition outputs, achieving relative tcpWER improvements of 9.55 % and 8.51 % over the state-of-the-art systems for both tasks.
Naoyuki Kamo、Tsubasa Ochiai、Marc Delcroix、Tomohiro Nakatani
语言学计算技术、计算机技术
Naoyuki Kamo,Tsubasa Ochiai,Marc Delcroix,Tomohiro Nakatani.MOVER: Combining Multiple Meeting Recognition Systems[EB/OL].(2025-08-07)[2025-08-18].https://arxiv.org/abs/2508.05055.点此复制
评论