|国家预印本平台
首页|基于相似字特征的中文拼写纠错方法

基于相似字特征的中文拼写纠错方法

hinese spelling error correction method based on similar character features

中文摘要英文摘要

中文拼写纠错 (Chinese Spelling Correct, CSC) 旨在精准地检测并修正中文文本中出现的拼写失误。这一任务对于确保文本内容的准确性和可读性至关重要,是中文信息处理领域的一个关键环节。目前,中文文本纠错任务主要依赖于BERT的语言模型来执行。其中 Soft-Masked BERT 为主流的方法之一,即先通过检测网络回归每个位置的字符的出错概率,随后通过软掩码的方式编码特征向量送入基于 BERT 的纠错网络。原方法在处理输入输出间的相似关系及连续字符错误上存在不足。为此,本项目提出了一种基于Soft-Masked BERT的中文拼字纠错模型。为强化BERT对相似词特征的捕捉,我们构建了相似字图,并通过图注意力网络将其融入模型中,使模型能有效考量纠正字与原始字之间的关联性。同时,本方法采用LSKnet单元来实现候选语句中相邻字符的关联学习,从而提升句子的整体流畅度。实验结果表明,该方法在中文拼写纠错任务中展现出了良好的性能。

hinese Spelling Correction (CSC) aims to accurately detect and rectify spelling errors in Chinese text. This task is pivotal for ensuring the textual content\'s accuracy and readability, representing a critical component within the field of Chinese Information Processing. Currently, the Chinese text error correction task mainly relies on the language model of BERT to perform. Among these, Soft-Masked BERT is one of the mainstream methods, which first utilizes a detection network to regress the error probability of each character at its respective position, then encodes feature vectors througChinese spelling error correction method based on similar character featuresh a soft masking mechanism, feeding them into a correction network built upon BERT. However, the original method cannot effectively handle similarities between input-output pairs and consistent symbol errors. In response to these limitations, this project proposes a Chinese character spelling correction model based on Soft-Masked BERT. In order to increase BERT\'s ability to capture similar word features, we generate a similarity graph between the signs and incorporate it into the model using graphical attention networks, which enables the similarities and differences between the revised signs and their original signs to be interpreted more effectively by the model. The method employs LSKnet units to facilitate associative learning among adjacent characters within candidate sentences, thus enhancing the overall fluency of the sentences. Experimental results demonstrate that this method has shown promising performance in the context of Chinese spelling correction tasks.

滕欣煜

汉语

人工智能中文拼写纠错LSKnetBERT

rtificial IntelligenceChinese spelling correctionLSKnetBERT

滕欣煜.基于相似字特征的中文拼写纠错方法[EB/OL].(2024-03-29)[2025-08-02].http://www.paper.edu.cn/releasepaper/content/202403-447.点此复制

评论