HCADecoder: 一种用于中文文本识别的混合CTC-Attention解码器
HCADecoder: A Hybrid CTC-Attention Decoder for Chinese Text Recgnition
近年来,文本识别在一些常用的公开英文数据集上引起了广泛的关注,并取得了令人振奋的成果。然而,这些成熟的方法,如基于连接主义时间分类的方法和基于注意力的方法,大部分对中文场景的挑战关注较少,尤其是对于长文本序列。本文利用中文词频分布的特点,提出了一种基于bigram混合标签监督的混合CTC-Attention解码器,用于中文文本识别。具体来说,我们首先在原有的单字标签中加入高频bigram子词,构建混合bigram标签,从而缩短解码长度。然后,在解码阶段,CTC模块先输出一个初步结果,再用bigram子词取代其中混淆的预测。最后,注意力模块利用上步结果,输出最终结果。在4个中文数据集上的实验结果证明了所提出的方法对中文文本识别的有效性,特别是对长文本的识别。代码将开源。
ext recognition has attracted much attention and achieved exciting results on several commonly used public English datasets in recent years. However, most of these well-established methods, such as connectionist temporal classification (CTC)-based methods and attention-based methods, pay less attention to challenges on the Chinese scene, especially for long text sequences. In this paper, we exploit the characteristic of Chinese word frequency distribution and propose a hybrid CTC-Attention decoder (HCADecoder) supervised with bigram mixture labels for Chinese text recognition. Specifically, we first add high-frequency bigram subwords into the original unigram labels to construct the mixture bigram label, which can shorten the decoding length. Then, in the decoding stage, the CTC module outputs a preliminary result, in which confused predictions are replaced with bigram subwords. The attention module utilizes the preliminary result and outputs the final result. Experimental results on four Chinese datasets demonstrate the effectiveness of the proposed method for Chinese text recognition, especially for long texts. Code will be made publicly available.
蔡斯琪、薛文元、李清勇
计算技术、计算机技术
计算机视觉中文文本识别-Attention子词
omputer visionChinese text recognitionCTC-AttentionSubword
蔡斯琪,薛文元,李清勇.HCADecoder: 一种用于中文文本识别的混合CTC-Attention解码器[EB/OL].(2021-03-23)[2025-04-27].http://www.paper.edu.cn/releasepaper/content/202103-249.点此复制
评论