|国家预印本平台
首页|基于Nutch的中文分词插件实现

基于Nutch的中文分词插件实现

he Implementation of Chinese word segmentation in Nutch

中文摘要英文摘要

中文分词是中文垂直搜索引擎中的一个关键技术,分词的好坏直接影响提取文本的精确度。Nutch是一个开源的Web搜索引擎,它为英文用户提供了较好的搜索结果。因此,本文对Nutch搜索引擎平台进行深入剖析后,在Nutch平台上实现了中文分词插件,从而使Nutch具有中文信息处理能力。

hinese word segmentation is the key technology to the Chinese search engine. Whether bad or good word segmentation will directly influence the accuracy for abstracting words. Nutch is an open source Web search engine, which takes effort to provide users with the best search results based on English. Thus, after the in-depth analysis of Nutch search engine platform ,the paper realize a word segmentation plug-in to make Nutch have the ability of processing Chinese word segmentation.

刘一伟、张文龙、孙杰

计算技术、计算机技术

Nutch垂直搜索中文分词插件信息抓取

NutchVertical Searchword segmentation plug-ininformation abstracting

刘一伟,张文龙,孙杰.基于Nutch的中文分词插件实现[EB/OL].(2010-12-31)[2025-08-21].http://www.paper.edu.cn/releasepaper/content/201012-1405.点此复制

评论