|国家预印本平台
首页|ChakmaNMT: A Low-resource Machine Translation On Chakma Language

ChakmaNMT: A Low-resource Machine Translation On Chakma Language

ChakmaNMT: A Low-resource Machine Translation On Chakma Language

来源:Arxiv_logoArxiv
英文摘要

The geopolitical division between the indigenous Chakma population and mainstream Bangladesh creates a significant cultural and linguistic gap, as the Chakma community, mostly residing in the hill tracts of Bangladesh, maintains distinct cultural traditions and language. Developing a Machine Translation (MT) model or Chakma to Bangla could play a crucial role in alleviating this cultural-linguistic divide. Thus, we have worked on MT between CCP-BN(Chakma-Bangla) by introducing a novel dataset of 15,021 parallel samples and 42,783 monolingual samples of the Chakma Language. Moreover, we introduce a small set for Benchmarking containing 600 parallel samples between Chakma, Bangla, and English. We ran traditional and state-of-the-art models in NLP on the training set, where fine-tuning BanglaT5 with back-translation using transliteration of Chakma achieved the highest BLEU score of 17.8 and 4.41 in CCP-BN and BN-CCP respectively on the Benchmark Dataset. As far as we know, this is the first-ever work on MT for the Chakma Language. Hopefully, this research will help to bridge the gap in linguistic resources and contribute to preserving endangered languages. Our dataset link and codes will be published soon.

Chumui Tripura、Rifat Shahriyar、Masum Hasan、Aunabil Chakma、Aditya Chakma、Soham Khisa

南亚语系(澳斯特罗-亚细亚语系)语言学常用外国语

Chumui Tripura,Rifat Shahriyar,Masum Hasan,Aunabil Chakma,Aditya Chakma,Soham Khisa.ChakmaNMT: A Low-resource Machine Translation On Chakma Language[EB/OL].(2024-10-14)[2025-08-02].https://arxiv.org/abs/2410.10219.点此复制

评论