|国家预印本平台
首页|一种倒排索引压缩方法

一种倒排索引压缩方法

中文摘要英文摘要

高效地访问倒排索引是搜索引擎快速响应用户查询的关键,而压缩倒排列表是提高搜索引擎性能的最重要的手段之一。针对自适应分段压缩ASCS算法进行了研究,对于ASCS算法中采用的均匀分段方式并非最优分段问题,提出以人工蜂群算法优化ASCS算法中的分段方式;对于ASCS算法考虑序列占用空间的影响因素过于单一问题,提出多因素下的改进算法;对于分布不均的长序列在ASCS算法下压缩率不理想问题,提出先排序后差分编码操作后再以ASCS算法压缩。通过对比实验证明优化改进后的算法可以较显著的压缩倒排索引。

Efficient access to the inverted index is a key aspect for a search engine to achieve fast response times to users queries. While compression of its posting lists is one of the most important methods to improve the performance of search engine. Segmentation method optimized by ABC algorithm in ASCS algorithm was proposed for the problem of ASCS algorithm that it adopts uniform segmentation instead of optimal segmentation; The ASCS algorithm only considers an influencing factor and ignores the influence of other factors ; The ratio of compressing long sequence of uneven distribution is unsatisfactory with ASCS algorithm, it was adopted that process integer sequence with sorting and differential encoding before ASCS algorithm. Simulation experiments show that the improved algorithm has significantly increased compression ratio of inverted index file comparing with ASCS algorithm.

李宛蓉、贺思云、白福均、肖绍武、高建瓴

10.12074/201805.00238V1

计算技术、计算机技术

搜索引擎倒排索引索引压缩人工蜂群算法SCS算法

李宛蓉,贺思云,白福均,肖绍武,高建瓴.一种倒排索引压缩方法[EB/OL].(2018-05-20)[2025-08-16].https://chinaxiv.org/abs/201805.00238.点此复制

评论