|国家预印本平台
首页|一种中文分词的预处理技术

一种中文分词的预处理技术

Chinese Word Segmentation of the Pre-treatment Technology

中文摘要英文摘要

首先分析基于词表的最大匹配分词算法,指出其存在的缺陷,然后针对这一缺陷提出了一种利用高频词的预处理技术,它根据高频词的特点,用很少的步骤将句子尽可能多的分成段,然后将段进行最大匹配。最后通过实验数据证明此技术将提高中文分词的效率。

Firstly, this paper analyzes the algorithm based on the vocabulary maximum matching word segmentation, point out its flaws, and then for this defect, proposed a use of high-frequency-words pre-treatment technology, it is based on the characteristics of high-frequency-words, with very little steps to the sentence is divided into as many paragraphs, and then carry out the maximum matching. Lastly through the experimental data proves that the technology will improve the Chinese word segmentation efficiency.

章栋兵、姚寒冰

汉语

高频词预处理中文分词

high-frequency-wordspre-treatmenthinese word segmentation

章栋兵,姚寒冰.一种中文分词的预处理技术[EB/OL].(2010-01-05)[2025-08-17].http://www.paper.edu.cn/releasepaper/content/201001-74.点此复制

评论