|国家预印本平台
首页|支持差分隐私保护及离群点消除的并行K-means算法

支持差分隐私保护及离群点消除的并行K-means算法

中文摘要英文摘要

针对大数据环境下聚类分析的隐私保护问题,基于MapReduce计算框架,提出了一种并行化的支持差分隐私保护和离群点消除的K-means算法。算法并行地计算数据集中各点间的欧氏距离矩阵与最近邻超球半径以导出离群点的判定阈值,并在此基础上完成差分隐私保护下的初始聚类中心选取和并行聚类过程。理论分析证明整个算法满足ε-差分隐私保护,实验结果说明该算法在隐私保护的有效性,聚类结果的可用性以及执行效率等方面取得了很好的平衡,相比于同类算法有较优的表现。

iming at the problem of privacy protection of clustering analysis in big data environment, based on the MapReduce computing framework, this paper proposed a parallel k-means algorithm that supports differential privacy protection and outlier elimination. The algorithm parallelly calculates the Euclidean distance matrix and nearest neighbor hypersphere radius between points in data set to derive the decision threshold of outliers, and then completes the initial cluster center selection and parallel clustering process under differential privacy protection. The theoretical analysis proves that the proposed algorithm satisfies -differential privacy, and the experimental results show that, compared with other algorithms, our algorithm performs better and has a good balance between the validity of privacy protection, the availability of clustering result and the efficiency of implementation.

刘建伟、樊一康

10.12074/201804.01420V1

计算技术、计算机技术

K-均值聚类离群点消除差分隐私MapReduce

刘建伟,樊一康.支持差分隐私保护及离群点消除的并行K-means算法[EB/OL].(2018-04-12)[2025-08-16].https://chinaxiv.org/abs/201804.01420.点此复制

评论