|国家预印本平台
首页|Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge

Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge

Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge

来源:Arxiv_logoArxiv
英文摘要

This paper presents the architecture and performance of a novel Multilingual Automatic Speech Recognition (ASR) system developed by the Transsion Speech Team for Track 1 of the MLC-SLM 2025 Challenge. The proposed system comprises three key components: 1) a frozen Whisper-large-v3 based speech encoder, leveraging large-scale pretraining to ensure robust acoustic feature extraction; 2) a trainable adaptor module using Linear-ReLU-Linear transformation mechanisms to effectively align speech and text representations; and 3) a frozen Qwen2.5-7B-Instruct large language model (LLM) integrated with trainable LoRA for optimized contextual linguistic decoding. By systematically combining pretrained models with task specific fine-tuning, the system achieved a word/character error rate (WER/CER) of 9.83% across 11 languages in the evaluation set and ranked third place among global participants.

Xiaoxiao Li、An Zhu、Youhai Jiang、Fengjie Zhu

语言学计算技术、计算机技术

Xiaoxiao Li,An Zhu,Youhai Jiang,Fengjie Zhu.Transsion Multilingual Speech Recognition System for MLC-SLM 2025 Challenge[EB/OL].(2025-08-15)[2025-09-02].https://arxiv.org/abs/2508.14916.点此复制

评论