|国家预印本平台
首页|基于Ant-Tree算法的短文本聚类研究

基于Ant-Tree算法的短文本聚类研究

n Improved Ant-Tree Algorithm for The Short-text Clustering

中文摘要英文摘要

短文本由于词频过低,使用常规的聚类算法如K-means效果不理想,难得到可接受的准确度。而最近结合使用生物启发及聚类内部有效性测量改进的方法,能够有效改善短文本的聚类效果。针对短文本聚类,提出了改进Ant-Tree的算法。该算法引入了轮廓系数作为内部效度测量,对K-means算法获得的初始聚类划分计算轮廓系数值,根据各聚簇样本值大小排序,将排序结果应用于Ant-Tree算法的初始化步骤中,使Ant-Tree算法性能得到提高。实验结果表明,该算法准确度超过了其它的算法。

Short-text document clustering is considered more difficult than general document due to the low frequencies of the terms. Its ruesult is not good or acceptable by the normal clustering algorithm like K-means.However, some recent works, new bioinspired clustering algorithms and novel uses of Internal Clustering Validity Measures have been presented to deal with this difficult problem. In this paper, an improved Ant-Tree algorithm for the short-text clustering is proposed. The algorithm uses K-means algorithm to obtain an initial grouping, sort the them in decreasing order according to the Silhouette Coefficient. The ordering will be used by Ant-Tree's initial step. By this way, the performance of Ant-Tree will be improved. Experimental study shows that this method is more accurate efficient than other algorithms.

吴勇、李仁发、刘钰峰

计算技术、计算机技术

短文本聚类nt-TreeK-means轮廓系数

Short-text clusteringAnt-TreeK-meansSilhouette Coefficient

吴勇,李仁发,刘钰峰.基于Ant-Tree算法的短文本聚类研究[EB/OL].(2011-06-03)[2025-08-10].http://www.paper.edu.cn/releasepaper/content/201106-74.点此复制

评论