|国家预印本平台
首页|基于可变窗口算法的中文分词应用研究

基于可变窗口算法的中文分词应用研究

Research on Chinese Word Segmentation Based on the Varible Window Algorthm

中文摘要英文摘要

随着中文网络的快速发展,对于网络中海量中文数据的实时处理成为一个引人关注的话题,而中文文本的自动分词技术是中文信息处理系统的重要基础部分,直接关系到系统的处理效率和准确性。在中文分词过程中,有两大难题一直没有完全突破,一是歧义识别,二是未登录词的识别。未登录词的种类和数量之多,是处理大规模真实文本的严重障碍。本文首先分析了中文分词需要解决的两大难题,而后重点分析了现有的解决未登录词问题的各种方案,并提出了解决未登录词识别中专有名词问题的一种基于统计的可变窗口算法,并通过实验证明此算法具有一定的可行性。

With the rapid development of the chinese network , real timely processing about the mass chinese data in the network has been more and more focused on . But Automatic Word Segmentation of Chinese Text is an important basic part of the chinese information processing , as is related to efficiency and precision about the system . In the process of word segmentation , there are two big problems which have not been overcome . One is ambiguity identification , and the other is identification of unlisted word . It is a serious obstacle l to deal with the large-scale real texts owing to lots of the variety and the quantity of the Identification of unlisted word. In this paper we first analyzed the two big problems of Chinese Segmentation , and then we analyzed the present methods to solve the problem about the unlisted word . It proposed the algorithm based on the variable window on the Statistics , it showed that the proposed algorithm is effective by the experiments.

张景春、单吉峰、张坤

汉语

中文分词可变窗口算法词典法分词

hinese Word SegmentationLexicon MechanismVariable Window Algorithm

张景春,单吉峰,张坤.基于可变窗口算法的中文分词应用研究[EB/OL].(2008-12-09)[2025-08-11].http://www.paper.edu.cn/releasepaper/content/200812-246.点此复制

评论