一种基于k-means的分布式k-anonymity算法
istributed k-anonymity Algorithm based on k-means
随着的大数据时代的到来,数据分享、数据发布的需求日益增加。然而未经处理发布或共享原始数据,将引起隐私泄露问题。k-anonymity匿名化原则作为解决数据匿名发布的基本原则被广泛应用。在大数据场景下,单计算节点已经不能满足计算需求,分布式k-anonymity算法研究显得尤为重要。本文针对大数据场景下的数据匿名发布,基于应用广泛的k-anonymity匿名化原则和k-means算法,提出了二元K-聚类算法和优化时间复杂度的快速二元K-聚类算法,最终测试验证了快速二元K-聚类算法对比基本的启发式算法具有较为明显时间优化和较小的数据损失程度。
With the arrival of big data era, there is increasing demand upon data sharing and data release. However, publishing or sharing raw data without anonymous processing would result in user privacy disclosure. Therefore, k-anonymity is widely used as a basic principle for above problems. In big data scenarios, single computing node is not able to meet the needs of k-anonymity computing, which makes the study of distributed k-anonymity algorithm particularly important. Based on k-anonymity principle and k-means algorithm, this paper proposed a binary k-clustering algorithm as distributed solution. Furthermore, this paper proposed quick binary k-clustering algorithm for time complexity optimization. The result of algorithm tests shows that the quick binary k-clustering algorithm has obvious time optimization as well as lesser degree of data loss compares to heuristic distributed k-anonymity algorithm.
程祥、张琦颖
计算技术、计算机技术
计算机软件k-means算法大数据数据发布k-匿名
computer softwarek-means algorithmbig datadata releasek-anonymity
程祥,张琦颖.一种基于k-means的分布式k-anonymity算法[EB/OL].(2017-12-22)[2025-08-21].http://www.paper.edu.cn/releasepaper/content/201712-285.点此复制
评论