|国家预印本平台
首页|Technical Report on classification of literature related to children speech disorder

Technical Report on classification of literature related to children speech disorder

Technical Report on classification of literature related to children speech disorder

来源:Arxiv_logoArxiv
英文摘要

This technical report presents a natural language processing (NLP)-based approach for systematically classifying scientific literature on childhood speech disorders. We retrieved and filtered 4,804 relevant articles published after 2015 from the PubMed database using domain-specific keywords. After cleaning and pre-processing the abstracts, we applied two topic modeling techniques - Latent Dirichlet Allocation (LDA) and BERTopic - to identify latent thematic structures in the corpus. Our models uncovered 14 clinically meaningful clusters, such as infantile hyperactivity and abnormal epileptic behavior. To improve relevance and precision, we incorporated a custom stop word list tailored to speech pathology. Evaluation results showed that the LDA model achieved a coherence score of 0.42 and a perplexity of -7.5, indicating strong topic coherence and predictive performance. The BERTopic model exhibited a low proportion of outlier topics (less than 20%), demonstrating its capacity to classify heterogeneous literature effectively. These results provide a foundation for automating literature reviews in speech-language pathology.

Ziang Wang、Amir Aryani

语言学

Ziang Wang,Amir Aryani.Technical Report on classification of literature related to children speech disorder[EB/OL].(2025-05-20)[2025-08-02].https://arxiv.org/abs/2505.14242.点此复制

评论