首页|ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages

ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages

来源：

英文摘要

Sentiment analysis for regional dialects of Bangla remains an underexplored area due to linguistic diversity and limited annotated data. This paper introduces ANUBHUTI, a comprehensive dataset consisting of 2000 sentences manually translated from standard Bangla into four major regional dialects Mymensingh, Noakhali, Sylhet, and Chittagong. The dataset predominantly features political and religious content, reflecting the contemporary socio political landscape of Bangladesh, alongside neutral texts to maintain balance. Each sentence is annotated using a dual annotation scheme: multiclass thematic labeling categorizes sentences as Political, Religious, or Neutral, and multilabel emotion annotation assigns one or more emotions from Anger, Contempt, Disgust, Enjoyment, Fear, Sadness, and Surprise. Expert native translators conducted the translation and annotation, with quality assurance performed via Cohens Kappa inter annotator agreement, achieving strong consistency across dialects. The dataset was further refined through systematic checks for missing data, anomalies, and inconsistencies. ANUBHUTI fills a critical gap in resources for sentiment analysis in low resource Bangla dialects, enabling more accurate and context aware natural language processing.

作者：Swastika Kundu、Mithila Rahman、Tanvir Ahmed、Autoshi Ibrahim

作者单位：

学科分类：印欧语系

推荐引用：Swastika Kundu,Mithila Rahman,Tanvir Ahmed,Autoshi Ibrahim.ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages[EB/OL].(2025-06-26)[2025-07-16].https://arxiv.org/abs/2506.21686.点此复制

ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages

ANUBHUTI: A Comprehensive Corpus For Sentiment Analysis In Bangla Regional Languages

评论