自适应截断密度峰值聚类欠采样研究
daptive Distance Density Peaks Clustering Undersampling
数据分布的不平衡性加剧了分类难度。为了减少类别分布不平衡带来的影响,本文将先进的密度峰值聚类和重采样方法相结合,提出自适应密度峰值聚类欠采样算法,解决了密度峰值聚类截断距离依赖人为设定的问题。该算法利用分类效果导向方式,通过随机森林和贝叶斯优化算法自动选择最优截断距离,从而优化了密度峰值聚类欠采样后的分类效果。同时通过贝叶斯寻优的方式,可以减少运行的时间复杂度,更加高效。为了验证算法改进的有效性,本文将新提出的算法宇原有的一些算法进行采样效果对比,从而从实践层面验证了算法的改进有效性。
he unbalance of data distribution aggravates the difficulty of classification. In order to reduce the influence caused by the unbalanced distribution of classes, this paper proposes an adaptive peak density clustering under-sampling algorithm by combining the advanced peak density clustering and resampling methods, the problem that the truncation distance of density peak clustering depends on artificial setting is solved. In this algorithm, the optimal truncation distance is automatically selected by random forest and Bayesian optimization algorithm, which optimizes the classification effect after undersampling. At the same time, by means of Bayesian optimization, it can reduce the running time complexity and be more efficient. In order to verify the effectiveness of the improved algorithm, this paper compares the sampling effect of the new algorithm with the original ones, so as to verify the effectiveness of the improved algorithm from the practical level.
彭德光、李贵珠
计算技术、计算机技术
不平衡数据PC贝叶斯优化算法
Unbalanced dataDPCBayesian optimization algorithm
彭德光,李贵珠.自适应截断密度峰值聚类欠采样研究[EB/OL].(2024-03-08)[2025-08-04].http://www.paper.edu.cn/releasepaper/content/202403-96.点此复制
评论