“两个半天”和“两个半月”——面向词法自动分析的涉数时间语素说略
hinese Lexical Auto-analysis of Time Words Related Numbers
涉数的时间语素的词类问题一直是汉语学界争论的一个热点,这些语素在汉语词法自动分析中也是最容易造成混淆和产生不一致的元素。本文从中文信息处理中词法自动分析的角度剖析了涉数时间语素的词类归属,考察了1200万真实语料中涉数时间语素的词性标注和自动分词情况,并提出了改进的面向汉语词法自动分析的这些语素的自动分词原则和词性标注原则。
he classification of the time words is a big problem in the Chinese grammar studies. In the Chinese lexical auto-analysis, these words may produce a great of non-consistency in the word-segmentation and POS-tagging. It analyzes their proper word-class oriented to the Chinese lexical auto-analysis, checks out their segmentation and POS-tagging results in a 12,000,000-character Chinese corpus, proves the rules of their segmentation and POS-tagging.
冯敏萱、张霄军
汉语语言学
涉数时间语素,词类,词法自动分析,词性标注,自动分词
time words related numbers word class lexical auto-analysis POS-tagging word-segmentation
冯敏萱,张霄军.“两个半天”和“两个半月”——面向词法自动分析的涉数时间语素说略[EB/OL].(2008-02-20)[2025-08-16].http://www.paper.edu.cn/releasepaper/content/200802-197.点此复制
评论