基于专注、并行和双模态原则的物流文本检测和识别网络

CPBNet：Concentrate,Parallel and Bimodal Network for Logistics Scene Text Detection and Recognition

马宇晨

摘要：物流行业中,快递分拣一直是保障物流顺利输送的重要环节。物流表单的识别直接决定了分拣效率,但如何在复杂分拣环境下有效的提高物流表单文本识别的准确率仍然是一个研究挑战。在本文中,我们认为,现有文本模型在物流分拣场景的局限主要来自于：1）环境复杂带来的干扰性；2）识别准确率和速度的不可兼得；3）单模态的局限性。相应的,本文提出了基于专注、并行、双模态原则指导下的CPBnet。首先,我们针对复杂场景,对表单进行角度上、几何上以及光度上的矫正。然后,采用并行方式,在视觉模型中加入Attention注意力机制指导CTC训练策略,利用Attention模型更准确的特点来训练骨干网络,得到更好的卷积特征,再用CTC分支来做预测,从而保证推断时的速度。最后,在视觉模型之后加入语言模型进行语义纠正,语言模型充分学习输入的上下文信息,以弥补视觉的语义缺失。现有的文本通用数据集中,分拣场景的图片基本很少,分拣场景领域数据的缺乏给深度学习在分拣场景的应用造成了瓶颈。故本文模拟真实表单数据,自制备分拣场景数据集,并通过大量实验证明CPBNet在该数据集上具有优势,取得了最先进的结果.

学科分类：综合运输自动化技术、自动化技术设备计算技术、计算机技术

中文关键词：人工智能文本检测与识别物流

推荐引用：马宇晨.基于专注、并行和双模态原则的物流文本检测和识别网络[EB/OL].(2024-03-01)[2025-11-03].http://www.paper.edu.cn/releasepaper/content/202403-20.点此复制

Abstract：In the logistics industry, express sorting has always been an important link in ensuring smooth transportation of logistics. The recognition of logistics forms directly determines the sorting efficiency, but how to effectively improve the accuracy of text recognition of logistics forms in complex sorting environments is still a research challenge. In this article, we believe that the limitations of existing text models in logistics sorting scenarios mainly come from: 1) the interference caused by complex environments; 2) the incompatibility of recognition accuracy and speed; 3) single-modal limitation. Accordingly, this paper proposes CPBnet based on the principles of concentrate,parallel and bimodal. First, we corrected the form angularly, geometrically, and photometrically for complex scenes. Then, using a parallel method, the Attention mechanism is added to the visual model to guide the CTC training strategy, and the more accurate characteristics of the Attention model are used to train the backbone network to obtain better convolution features, and then the CTC branch is used for prediction, thereby ensuring Speed at inference. Finally, a language model is added after the visual model for semantic correction. The language model fully learns the input contextual information to make up for the visual semantic deficiency. There are basically very few pictures of sorting scenes in existing general text data sets. The lack of data in the field of sorting scenes has created a bottleneck for the application of deep learning in sorting scenes. Therefore, this article simulates real form data, prepares a sorting scene data set by itself, and proves through a large number of experiments that CPBNet has advantages on this data set and achieves the most advanced results.

Keywords：Artificial intelligence text detection and recognition logistics

展开英文信息

基于专注、并行和双模态原则的物流文本检测和识别网络

CPBNet：Concentrate,Parallel and Bimodal Network for Logistics Scene Text Detection and Recognition

评论