Clustering scientific publications: lessons learned through experiments with a real citation network
Clustering scientific publications: lessons learned through experiments with a real citation network
Clustering scientific publications can reveal underlying research structures within bibliographic databases. Graph-based clustering methods, such as spectral, Louvain, and Leiden algorithms, are frequently utilized due to their capacity to effectively model citation networks. However, their performance may degrade when applied to real-world data. This study evaluates the performance of these clustering algorithms on a citation graph comprising approx. 700,000 papers and 4.6 million citations extracted from Web of Science. The results show that while scalable methods like Louvain and Leiden perform efficiently, their default settings often yield poor partitioning. Meaningful outcomes require careful parameter tuning, especially for large networks with uneven structures, including a dense core and loosely connected papers. These findings highlight practical lessons about the challenges of large-scale data, method selection and tuning based on specific structures of bibliometric clustering tasks.
Vu Thi Huong、Thorsten Koch
自然科学研究方法信息科学、信息技术
Vu Thi Huong,Thorsten Koch.Clustering scientific publications: lessons learned through experiments with a real citation network[EB/OL].(2025-05-15)[2025-06-06].https://arxiv.org/abs/2505.18180.点此复制
评论