|国家预印本平台
首页|Evaluating BERTopic on Open-Ended Data: A Case Study with Belgian Dutch Daily Narratives

Evaluating BERTopic on Open-Ended Data: A Case Study with Belgian Dutch Daily Narratives

Evaluating BERTopic on Open-Ended Data: A Case Study with Belgian Dutch Daily Narratives

来源:Arxiv_logoArxiv
英文摘要

This study explores BERTopic's potential for modeling open-ended Belgian Dutch daily narratives, contrasting its performance with Latent Dirichlet Allocation (LDA) and KMeans. Although LDA scores well on certain automated metrics, human evaluations reveal semantically irrelevant co-occurrences, highlighting the limitations of purely statistic-based methods. In contrast, BERTopic's reliance on contextual embeddings yields culturally resonant themes, underscoring the importance of hybrid evaluation frameworks that account for morphologically rich languages. KMeans performed less coherently than prior research suggested, pointing to the unique challenges posed by personal narratives. Our findings emphasize the need for robust generalization in NLP models, especially in underrepresented linguistic contexts.

Ratna Kandala、Katie Hoemann

语言学常用外国语

Ratna Kandala,Katie Hoemann.Evaluating BERTopic on Open-Ended Data: A Case Study with Belgian Dutch Daily Narratives[EB/OL].(2025-04-20)[2025-05-25].https://arxiv.org/abs/2504.14707.点此复制

评论