|国家预印本平台
首页|Training Kindai OCR with parallel textline images and self-attention feature distance-based loss

Training Kindai OCR with parallel textline images and self-attention feature distance-based loss

Training Kindai OCR with parallel textline images and self-attention feature distance-based loss

来源:Arxiv_logoArxiv
英文摘要

Kindai documents, written in modern Japanese from the late 19th to early 20th century, hold significant historical value for researchers studying societal structures, daily life, and environmental conditions of that period. However, transcribing these documents remains a labor-intensive and time-consuming task, resulting in limited annotated data for training optical character recognition (OCR) systems. This research addresses this challenge of data scarcity by leveraging parallel textline images - pairs of original Kindai text and their counterparts in contemporary Japanese fonts - to augment training datasets. We introduce a distance-based objective function that minimizes the gap between self-attention features of the parallel image pairs. Specifically, we explore Euclidean distance and Maximum Mean Discrepancy (MMD) as domain adaptation metrics. Experimental results demonstrate that our method reduces the character error rate (CER) by 2.23% and 3.94% over a Transformer-based OCR baseline when using Euclidean distance and MMD, respectively. Furthermore, our approach improves the discriminative quality of self-attention representations, leading to more effective OCR performance for historical documents.

Anh Le、Asanobu Kitamoto

计算技术、计算机技术

Anh Le,Asanobu Kitamoto.Training Kindai OCR with parallel textline images and self-attention feature distance-based loss[EB/OL].(2025-08-12)[2025-08-24].https://arxiv.org/abs/2508.08537.点此复制

评论