|国家预印本平台
首页|一种基于匹配区域特征的相似字符串匹配过滤算法

一种基于匹配区域特征的相似字符串匹配过滤算法

Filter Algorithm for Approximate String Matching Based on Match-Region Features

中文摘要英文摘要

为加快相似字符串匹配中过滤算法的匹配速度,提出了一种基于匹配区域特征的过滤算法。该算法对文本串预处理采用了q-gram索引结构;对模式串和文本串进行了固定长度的逻辑分区,并从分区中提取了新的匹配区域特征,而后利用这些特征优化了基础q-gram过滤定理;还改进了基于分区策略的过滤区确定方案。实验结果表明,该算法通过提高过滤效率有效地加快了过滤算法的匹配速度,尤其在误差容忍率较小时效果更佳。

he aim of this study is to improve the speed of approximate string matching. Filter algorithm for approximate string matching is discussed because it is suitable for large-scale text searching. A novel filter algorithm based on Match-Region features is presented. First, a q-gram index is used to preprocess text. Second, both pattern and text are logically divided into blocks of fixed size kq+1, and then new Match-Region features are extracted from blocks, and the algorithm optimizes the fundamental q-gram filtration lemma by the new features. Last, the improved method of choosing Filtration-Region based on blocks is used for filtration. The experimental results demonstrate that the algorithm achieves higher matching speed by way of improving filtration efficiency, and they also reveal the relationship between matching speed and error rate of new algorithm. In particular, its matching speed is faster than that of QUASAs’s Block Addressing under low error rate. These results suggest that the algorithm is useful in a system for approximate string matching with low error rate.

张伟、孙德才、孙星明、刘玉玲

计算技术、计算机技术

相似字符串匹配过滤算法匹配区域特征过滤效率q-gram

pproximate String MatchingFilter AlgorithmMatch-Region FeatureFiltration Efficiencyq-gram

张伟,孙德才,孙星明,刘玉玲.一种基于匹配区域特征的相似字符串匹配过滤算法[EB/OL].(2009-01-12)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/200901-485.点此复制

评论