|国家预印本平台
首页|基于索引列聚集的HBase二级索引设计

基于索引列聚集的HBase二级索引设计

HBase Secondary Index Design Based on Aggregation of Index Columns

中文摘要英文摘要

HBase作为Apache开源的列式存储数据库,在一定程度上解决了海量数据存储的问题,并提供了高效的数据读写性能。但是由于HBase缺乏二级索引的功能,导致在非行键列上的查询需要使用过滤器并配合全表扫描完来完成,在海量数据的场景下性能较差。本文结合HBase表行键的索引结构与关系型数据库的二级索引结构提出了索引列值聚集的二级索引解决方案。此外,本文提出二级索引机制的支持联合索引与特殊的索引列值的处理,提高了二级索引的性能并拓宽了二级索引的适用场景。最后,本文通过构建系统测试证明了二级索引极大地提高了HBase的查询效率。

HBase is a open-source column-oriented storage database of Apache and it solves massive data storage problem in some degree. But because HBase lacks function of secondary index, query on non-rowkey columns must be done through filter and whole table scan, which is low performance in big data scenario. This paper proposed an index column aggregation scheme of secondary index based on structure of rowkey\'s index and structure of secondary index in relational database. Besides, the secondary index mechanism proposed by this paper also supported composite index and special index column value senario, which enhanced the performance of secondary index and broadened the applicable scenario of HBase secondary index. In the end, this paper built system test and verified that the secondary index greatly promoted the query efficiency of HBase.

双锴、张祎

计算技术、计算机技术

计算机软件HBase二级索引聚集转义

omputer SoftwareHBaseSecondary IndexAggregationEscaping

双锴,张祎.基于索引列聚集的HBase二级索引设计[EB/OL].(2017-12-08)[2025-08-23].http://www.paper.edu.cn/releasepaper/content/201712-107.点此复制

评论