|国家预印本平台
首页|RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations

RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations

RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations

来源:Arxiv_logoArxiv
英文摘要

We introduce RASMALAI, a large-scale speech dataset with rich text descriptions, designed to advance controllable and expressive text-to-speech (TTS) synthesis for 23 Indian languages and English. It comprises 13,000 hours of speech and 24 million text-description annotations with fine-grained attributes like speaker identity, accent, emotion, style, and background conditions. Using RASMALAI, we develop IndicParlerTTS, the first open-source, text-description-guided TTS for Indian languages. Systematic evaluation demonstrates its ability to generate high-quality speech for named speakers, reliably follow text descriptions and accurately synthesize specified attributes. Additionally, it effectively transfers expressive characteristics both within and across languages. IndicParlerTTS consistently achieves strong performance across these evaluations, setting a new standard for controllable multilingual expressive speech synthesis in Indian languages.

Ashwin Sankar、Yoach Lacombe、Sherry Thomas、Praveen Srinivasa Varadhan、Sanchit Gandhi、Mitesh M Khapra

南亚语系(澳斯特罗-亚细亚语系)计算技术、计算机技术

Ashwin Sankar,Yoach Lacombe,Sherry Thomas,Praveen Srinivasa Varadhan,Sanchit Gandhi,Mitesh M Khapra.RASMALAI: Resources for Adaptive Speech Modeling in Indian Languages with Accents and Intonations[EB/OL].(2025-05-24)[2025-07-17].https://arxiv.org/abs/2505.18609.点此复制

评论