|国家预印本平台
首页|CMU's IWSLT 2024 Simultaneous Speech Translation System

CMU's IWSLT 2024 Simultaneous Speech Translation System

CMU's IWSLT 2024 Simultaneous Speech Translation System

来源:Arxiv_logoArxiv
英文摘要

This paper describes CMU's submission to the IWSLT 2024 Simultaneous Speech Translation (SST) task for translating English speech to German text in a streaming manner. Our end-to-end speech-to-text (ST) system integrates the WavLM speech encoder, a modality adapter, and the Llama2-7B-Base model as the decoder. We employ a two-stage training approach: initially, we align the representations of speech and text, followed by full fine-tuning. Both stages are trained on MuST-c v2 data with cross-entropy loss. We adapt our offline ST model for SST using a simple fixed hold-n policy. Experiments show that our model obtains an offline BLEU score of 31.1 and a BLEU score of 29.5 under 2 seconds latency on the MuST-C-v2 tst-COMMON.

Patrick Fernandes、Xi Xu、Graham Neubig、William Chen、Brian Yan、Siqi Ouyang、Lei Li、Shinji Watanabe

通信计算技术、计算机技术电子技术应用

Patrick Fernandes,Xi Xu,Graham Neubig,William Chen,Brian Yan,Siqi Ouyang,Lei Li,Shinji Watanabe.CMU's IWSLT 2024 Simultaneous Speech Translation System[EB/OL].(2024-08-14)[2025-07-09].https://arxiv.org/abs/2408.07452.点此复制

评论