Evaluating BERTopic on Open-Ended Data: A Case Study with Belgian Dutch Daily Narratives
Evaluating BERTopic on Open-Ended Data: A Case Study with Belgian Dutch Daily Narratives
This study explores BERTopic's potential for modeling open-ended Belgian Dutch daily narratives, contrasting its performance with Latent Dirichlet Allocation (LDA) and KMeans. Although LDA scores well on certain automated metrics, human evaluations reveal semantically irrelevant co-occurrences, highlighting the limitations of purely statistic-based methods. In contrast, BERTopic's reliance on contextual embeddings yields culturally resonant themes, underscoring the importance of hybrid evaluation frameworks that account for morphologically rich languages. KMeans performed less coherently than prior research suggested, pointing to the unique challenges posed by personal narratives. Our findings emphasize the need for robust generalization in NLP models, especially in underrepresented linguistic contexts.
Ratna Kandala、Katie Hoemann
语言学常用外国语
Ratna Kandala,Katie Hoemann.Evaluating BERTopic on Open-Ended Data: A Case Study with Belgian Dutch Daily Narratives[EB/OL].(2025-04-20)[2025-05-25].https://arxiv.org/abs/2504.14707.点此复制
评论