|国家预印本平台
首页|一种基于近似类抽样的组合聚类方法

一种基于近似类抽样的组合聚类方法

an Combination Clustering Method Based on Sampling Using Approximate Aggregation

中文摘要英文摘要

k-means聚类算法和Fuzzy C-Means算法的时间复杂度,对于海量数据挖掘都还能让人接受,但聚类效果受初始化影响很大,以致结果很不稳定。当初始点集选择合适时,这两种算法的聚类结果很不错;而初始点选择很偏时,聚类效果很不理想。而k-中心点轮换法对结构较好的数据点集分布聚类效果不错,尤其是它对初始化不太敏感,所以当数据点集规模较小时,这种算法是一个不错的选择。其最大的缺点就是时间复杂度太高了,显然不能直接应用到海量数据集的聚类中。为了克服这两类聚类算法的缺点,而充分利用它们的优点,很自然地提出一种基于近似类抽样的组合聚类算法。仿真实验结果表明,这种混和聚类算法的聚类结果与k-中心点轮换法一样好,在一般情况下,其时间复杂度只是O(n*n*m)。

he time complexity of k-means clustering method and fuzzy c-means clustering method can be accepted when used in data mining of huge amounts of data sets. But these two methods are sensitive to initialization. When these two methods make an appropriate initialization, they can get a satisfied clustering result. When these two methods make an improper initialization, they can not often get an acceptable clustering result. The k-medoids substitution clustering method based on the idea of simplex method has more good clustering effect and less sensitivity to an initialized medoids set than k-means and k-medoids, when clustering those sets of date points with some similar-size clusters. But its time complexity is too high, so can not be used in huge amounts of data sets. In order to solve their shortcomings, a combination clustering method based on sampling using approximate aggregation is presented naturally. Usually, this method still needs O(n*n*m).

陈新泉

计算技术、计算机技术

k-means聚类 k-中心点轮换法 组合聚类算法 近似类抽样

k-means Clustering k-Medoids Substitution Clustering Method Combination Clustering Method Sampling Based on Approximate Aggregation

陈新泉.一种基于近似类抽样的组合聚类方法[EB/OL].(2005-12-22)[2025-08-16].http://www.paper.edu.cn/releasepaper/content/200512-588.点此复制

评论