|国家预印本平台
首页|对聚类算法K-means的研究与改进

对聚类算法K-means的研究与改进

Research and Improvement on K-means

中文摘要英文摘要

针对传统的K-means算法中聚类的质量受聚类数目的选取和初始聚类中心的确定的影响很大的问题,我们提出了以文档密度为参数,将文档划分为若干块后再针对各块做聚类,得到一种适合文本数据的聚类分析算法,并以此减少该算法对初始中心的依赖性,快速定位新增数据的所属类别。实验证明,该算法能够生成质量较高的聚类效果,同时对于新增数据的适应性相对来说要更好。

he traditional K-means algorithm is great affected by the quality of the cluster’s number and the initial cluster selected ,in order to reduce the dependence on the initial centers and rapidly positioning the types of new data, we have the document density as parameters, divide the documents into several blocks and do clustering on every blocks, this algorithm is fit for the text data. Experiments show that this algorithm can make a higher quality for cluster, while it also do well in the new increasing data.

吕玉琴、郑立杰、刘刚、赵鑫

计算技术、计算机技术

K-means密度网格纯度GADK-means

K-meansensityGridPurityGADK-means

吕玉琴,郑立杰,刘刚,赵鑫.对聚类算法K-means的研究与改进[EB/OL].(2008-11-14)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/200811-397.点此复制

评论