|国家预印本平台
首页|微型汉语系统的构建及汉语文章的自动生成

微型汉语系统的构建及汉语文章的自动生成

On Construction of a Mini Chinese Language System and Automatic Generation of Articles

中文摘要英文摘要

本文尝试建立一个只包含少量汉字的微型汉语系统,并用它来研究计算机如何理解汉语句子的意义、汉语词汇的形成规律以及汉语文章的自动生成原理。本文依据比较简单的生态系统来构建这个微型汉语系统,并将其划分成四个层次。这四个层次分别是物理层、连接层、规则层和表达层。在连接层上,所有汉字通过句子的学习可以形成相互之间的连接,并通过句子强度这一参数来判断一个句子的意义。在形成了足够多的句子以后,利用统计的方法,将其中出现频率比较高的汉字组合视为由多个汉字组成的词汇。在规则层,则通过设置各种规则来使系统生成比较长的文章,达到计算机自动生成文章的目的。从计算机程序运行结果来看,该微型汉语系统能够正确识别有意义和无意义的句子,而所生成的词汇也基本上符合现代汉语的习惯。本文也尝试采用N-gram模型来解决相同的问题,结果是令人失望的。最后本文利用该系统生成文章,虽然从自动生成比较长的文章方面来看,计算机的能力比较有限,输出文章格式比较单一,与人类书写的文章相比,还存在比较大的差距,但是从可读性等方面来看,还是能够达到基本的阅读要求的,且可改进的空间也很大。从本工作的结果来看,传统的计算机体系结构在处理自然语言方面还是有很大的潜力可挖的。

In this paper, we established a mini Chinese language system that includes only a few Chinese characters in order to make the computer understanding the meanings of the sentences, the formation of Chinese words and how to automatically generate the articles by computer. We build the mini Chinese language systems based on the simple ecological system, and divided it into four layers. The four layers are the physical layer, link layer, rule layer and presentation layer. All the Chinese characters can form the connections between each other through the sentence learning at the link layer. We can also determine whether the computer understand the meaning of a sentence by the parameter of Sentence Weights. After sufficient numbers of sentences have been formed, we can get the Chinese words that composed of multiple Chinese characters from the high frequency Chinese character combinations. The mini Chinese language system can also output the long articles by using multiple rules in the rule layer to achieve the purpose of automatically generating articles by using computer. The Mini Chinese language system can identify the meaningful and meaningless sentences successfully from the results of the experiments. The Chinese words generated by the system are basically in line with the uses of modern Chinese language. We also try to use the N-gram model to solve the similar problem in this work, but we get the worse results. Although the computers abilities in automatically generating articles are relatively limited, the output format of the articles relatively simple by compared with the humans writing articles from the experiment results, but it can achieve the basic reading requirements in terms of readability. It also has a great room for improvement. The results from this work also show that there are a lot of potential to be developed in natural language processing by using the traditional computer architecture.

程智

汉语计算技术、计算机技术

中文处理人工智能汉语模型

hinese language processingArtificial intelligenceChinese language model

程智.微型汉语系统的构建及汉语文章的自动生成[EB/OL].(2012-03-05)[2025-08-18].http://www.paper.edu.cn/releasepaper/content/201203-132.点此复制

评论