基于最小生成树的半监督聚类算法
MST-based Semi-supervised Clustering using M-lableed objects
已知的多数半监督聚类算法依赖成对约束的方法提出。这些算法通常使用先验知识提高聚类精度。本文使用另一种叫做标签传播的半监督方法帮助算法检测簇。本文提出两个半聚类算法,分别是K-SSMST算法和M-SSMST算法。两个算法都能够很好地聚类多密度和形状不规则的数据集。K-SSMST算法利用最小生成树变种算法,使用K个已知标签对数据集进行自然聚簇。M-SSMST算法在给定半监督信息不足的情况下可以发现新簇。算法在标准UCI数据集和人工数据集上进行测试,结果证实算法优于对比算法。
Most of the existing semi-supervised clustering algorithms depend on pairwise constraints, and they usually use lots of priori knowledge to improve their accuracies. In this paper, we use another semi-supervised method called label propagation to show how labeled objects help the algorithms to detect clusters. We propose two new semi-supervised algorithms which have the ability to discover clusters of diverse density and arbitrary shape, named MST-based Semi-Supervised clustering using K-labeled objects ( K-SSMST ) and MST-based Semi-Supervised clustering using M-labeled objects ( M-SSMST ). Based on minimum spanning tree ( K-MST ), the two algorithms assign objects to clusters by using labeled objects. K-SSMST algorithm could automatically find natural clusters in a dataset. It does not need any input parameter and only requires K labeled data objects where K is the number of clusters. M-SSMST can detect new clusters when the number of labeled data M is less than K. It only requires one input parameter. Our algorithms were tested on both various artificial datasets and UCI datasets. The results demonstrate the accuracy when compared with other supervised and semi-supervised approaches.
霍萌萌、陈晓云、刘阳阳
计算技术、计算机技术
数据挖掘半监督学习聚类标签传播,最小生成树
data miningsemi-supervised clusteringlabel propagationMST
霍萌萌,陈晓云,刘阳阳.基于最小生成树的半监督聚类算法[EB/OL].(2012-04-06)[2025-08-16].http://www.paper.edu.cn/releasepaper/content/201204-80.点此复制
评论