首页|Sample-Efficient Language Model for Hinglish Conversational AI

Sample-Efficient Language Model for Hinglish Conversational AI

来源：

英文摘要

This paper presents our process for developing a sample-efficient language model for a conversational Hinglish chatbot. Hinglish, a code-mixed language that combines Hindi and English, presents a unique computational challenge due to inconsistent spelling, lack of standardization, and limited quality of conversational data. This work evaluates multiple pre-trained cross-lingual language models, including Gemma3-4B and Qwen2.5-7B, and employs fine-tuning techniques to improve performance on Hinglish conversational tasks. The proposed approach integrates synthetically generated dialogues with insights from existing Hinglish datasets to address data scarcity. Experimental results demonstrate that models with fewer parameters, when appropriately fine-tuned on high-quality code-mixed data, can achieve competitive performance for Hinglish conversation generation while maintaining computational efficiency.

作者：Sakshi Singh、Abhinav Prakash、Aakriti Shah、Chaitanya Sachdeva、Sanjana Dumpala

作者单位：

学科分类：印欧语系

推荐引用：Sakshi Singh,Abhinav Prakash,Aakriti Shah,Chaitanya Sachdeva,Sanjana Dumpala.Sample-Efficient Language Model for Hinglish Conversational AI[EB/OL].(2025-04-26)[2025-05-21].https://arxiv.org/abs/2504.19070.点此复制

Sample-Efficient Language Model for Hinglish Conversational AI

Sample-Efficient Language Model for Hinglish Conversational AI

评论