|国家预印本平台
首页|Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization

Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization

Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization

来源:Arxiv_logoArxiv
英文摘要

Consumer Health Queries (CHQs) in Bengali (Bangla), a low-resource language, often contain extraneous details, complicating efficient medical responses. This study investigates the zero-shot performance of nine advanced large language models (LLMs): GPT-3.5-Turbo, GPT-4, Claude-3.5-Sonnet, Llama3-70b-Instruct, Mixtral-8x22b-Instruct, Gemini-1.5-Pro, Qwen2-72b-Instruct, Gemma-2-27b, and Athene-70B, in summarizing Bangla CHQs. Using the BanglaCHQ-Summ dataset comprising 2,350 annotated query-summary pairs, we benchmarked these LLMs using ROUGE metrics against Bangla T5, a fine-tuned state-of-the-art model. Mixtral-8x22b-Instruct emerged as the top performing model in ROUGE-1 and ROUGE-L, while Bangla T5 excelled in ROUGE-2. The results demonstrate that zero-shot LLMs can rival fine-tuned models, achieving high-quality summaries even without task-specific training. This work underscores the potential of LLMs in addressing challenges in low-resource languages, providing scalable solutions for healthcare query summarization.

Ajwad Abrar、Farzana Tabassum、Sabbir Ahmed

语言学计算技术、计算机技术南亚语系(澳斯特罗-亚细亚语系)

Ajwad Abrar,Farzana Tabassum,Sabbir Ahmed.Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization[EB/OL].(2025-05-08)[2025-06-19].https://arxiv.org/abs/2505.05070.点此复制

评论