基于k-means聚类算法的研究
Research of clustering algorithm based on K-means
本文首先分析研究聚类分析方法,对多种聚类分析算法进行分析比较,讨论各自的优点和不足,同时针对原k-means算法的聚类结果受随机选取初始聚类中心的影响较大的缺点,提出一种改进算法。通过将对数据集的多次采样,选取最终较优的初始聚类中心,使得改进后的算法受初始聚类中心选择的影响度大大降低;同时,在选取初始聚类中心后,对初值进行数据标准化处理,使聚类效果进一步提高。通过UCI数据集上的数据对新算法Hk-means进行检测,结果显示Hk-means算法比原始的k-means算法在聚类效果上有显著的提高,并对相关领域有借鉴意义。
Firstly, the paper analyzes and research the method of cluster analysis, analyzes and compares many kinds of algorithms of cluster analysis, discusses their respective strengths and weaknesses. At the same time, according to the weaknesses of the cluster result of original k-means algorithm is significant influence by selecting the initial cluster centers randomly, a modified algorithm is proposed. Through taking sample many times to data set, choose final superior cluster center, bring down the impact of initial cluster centers to improved algorithm greatly.Simultaneously, the initial data is standadized once the initial cluster center is selected, makes cluster effect improved furthermore. Detecting new algorithm Hk-means through the date of UCI data set, the result shows that Hk-means algorithm is more prominent improved than initial k-means algorithm in cluster effect, and it's useful for conference to relative field.
刘胜辉、谭艳娜、黄韬
计算技术、计算机技术
数据挖掘聚类算法k-means算法
ata miningclustering algorithmk-means algorithm
刘胜辉,谭艳娜,黄韬.基于k-means聚类算法的研究[EB/OL].(2010-12-08)[2025-04-29].http://www.paper.edu.cn/releasepaper/content/201012-198.点此复制
评论