|国家预印本平台
首页|Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages

Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages

Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages

来源:Arxiv_logoArxiv
英文摘要

Automatic speech recognition (ASR) for dysarthric speech remains challenging due to data scarcity, particularly in non-English languages. To address this, we fine-tune a voice conversion model on English dysarthric speech (UASpeech) to encode both speaker characteristics and prosodic distortions, then apply it to convert healthy non-English speech (FLEURS) into non-English dysarthric-like speech. The generated data is then used to fine-tune a multilingual ASR model, Massively Multilingual Speech (MMS), for improved dysarthric speech recognition. Evaluation on PC-GITA (Spanish), EasyCall (Italian), and SSNCE (Tamil) demonstrates that VC with both speaker and prosody conversion significantly outperforms the off-the-shelf MMS performance and conventional augmentation techniques such as speed and tempo perturbation. Objective and subjective analyses of the generated data further confirm that the generated speech simulates dysarthric characteristics.

Chin-Jou Li、Eunjung Yeo、Kwanghee Choi、Paula Andrea Pérez-Toro、Masao Someki、Rohan Kumar Das、Zhengjun Yue、Juan Rafael Orozco-Arroyave、Elmar N?th、David R. Mortensen

印欧语系南印语系(达罗毗荼语系、德拉维达语系)计算技术、计算机技术

Chin-Jou Li,Eunjung Yeo,Kwanghee Choi,Paula Andrea Pérez-Toro,Masao Someki,Rohan Kumar Das,Zhengjun Yue,Juan Rafael Orozco-Arroyave,Elmar N?th,David R. Mortensen.Towards Inclusive ASR: Investigating Voice Conversion for Dysarthric Speech Recognition in Low-Resource Languages[EB/OL].(2025-05-20)[2025-07-16].https://arxiv.org/abs/2505.14874.点此复制

评论