首页|Training Kindai OCR with parallel textline images and self-attention feature distance-based loss

Training Kindai OCR with parallel textline images and self-attention feature distance-based loss

来源：

英文摘要

Kindai documents, written in modern Japanese from the late 19th to early 20th century, hold significant historical value for researchers studying societal structures, daily life, and environmental conditions of that period. However, transcribing these documents remains a labor-intensive and time-consuming task, resulting in limited annotated data for training optical character recognition (OCR) systems. This research addresses this challenge of data scarcity by leveraging parallel textline images - pairs of original Kindai text and their counterparts in contemporary Japanese fonts - to augment training datasets. We introduce a distance-based objective function that minimizes the gap between self-attention features of the parallel image pairs. Specifically, we explore Euclidean distance and Maximum Mean Discrepancy (MMD) as domain adaptation metrics. Experimental results demonstrate that our method reduces the character error rate (CER) by 2.23% and 3.94% over a Transformer-based OCR baseline when using Euclidean distance and MMD, respectively. Furthermore, our approach improves the discriminative quality of self-attention representations, leading to more effective OCR performance for historical documents.

作者：Anh Le、Asanobu Kitamoto

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Anh Le,Asanobu Kitamoto.Training Kindai OCR with parallel textline images and self-attention feature distance-based loss[EB/OL].(2025-08-12)[2025-08-24].https://arxiv.org/abs/2508.08537.点此复制

Training Kindai OCR with parallel textline images and self-attention feature distance-based loss

Training Kindai OCR with parallel textline images and self-attention feature distance-based loss

评论