|国家预印本平台
首页|基于高维稀疏数据的k-分桶高效skyline查询算法

基于高维稀疏数据的k-分桶高效skyline查询算法

Efficient K-buckets Skyline Query Algorithm on High-dimensional sparse data

中文摘要英文摘要

针对高维数据的Skyline查询处理已经逐渐成为研究的热点,当前主要采用降维方法和k-分桶skyline查询算法,但这些算法实现是基于数据对象是完整且精确的假设前提下,但在实际应用中面临的高维数据对象(特别是采集的网络数据)往往是不完整的。桶算法是现有的针对不完整数据的有效算法,但是桶数目随着维度增高而指数级增长,造成桶存储空间严重浪费,同时Skyline查询效率也随分桶数目增加而降低。因此本文针对高维稀疏数据,为了节省存储空间和优化Skyline查询集合,提出了高维稀疏k-分桶的概念,提供了高效的k-分桶skyline查询算法。该算法能够有效地控制分桶数目,采用特定的桶填充策略减少候选skyline集合的数目。经过实验验证,k-分桶skyline查询算法特别适用于大规模的高维稀疏数据,稀疏程度越高,算法的优势越明显。

he skyline query on High-dimensional data , a multi-attribute data objects, becomes a research hotspot. Traditional dimensionality reduction and k-dominant skyline query on high-dimensional data are based on the suggestions that data objects are both complete and accurate. However the high-dimensional data objects (especially the collection of network data) are often incomplete in practical implementation. The bucket algorithm designed for incomplete data has the drawbacks that the number of buckets increases exponentially with the increase of dimensionality, which will cause serious waste of storage space. The paper proposed a new concept of high dimensional k-dominant. And an efficient k-buckets skyline query algorithm is proposed to solve the problems of high dimensional sparse data. The algorithm can effectively control the number of buckets and reduces the size of the set of candidate skylines. The experiment verifies that this new algorithm is especially suitable for high-dimensional sparse data. The Sparse of the data is higher and the advantages of the algorithm is more obvious.

李建中、王宏志、徐妍妍、高宏

计算技术、计算机技术

高维稀疏skyline查询k-分桶

high-dimensionalsparsek-bucketsskyline query

李建中,王宏志,徐妍妍,高宏.基于高维稀疏数据的k-分桶高效skyline查询算法[EB/OL].(2012-12-20)[2025-08-11].http://www.paper.edu.cn/releasepaper/content/201212-493.点此复制

评论