Assessing the impact of Binarization for Writer Identification in Greek Papyrus
Assessing the impact of Binarization for Writer Identification in Greek Papyrus
This paper tackles the task of writer identification for Greek papyri. A common preprocessing step in writer identification pipelines is image binarization, which prevents the model from learning background features. This is challenging in historical documents, in our case Greek papyri, as background is often non-uniform, fragmented, and discolored with visible fiber structures. We compare traditional binarization methods to state-of-the-art Deep Learning (DL) models, evaluating the impact of binarization quality on subsequent writer identification performance. DL models are trained with and without a custom data augmentation technique, as well as different model selection criteria are applied. The performance of these binarization methods, is then systematically evaluated on the DIBCO 2019 dataset. The impact of binarization on writer identification is subsequently evaluated using a state-of-the-art approach for writer identification. The results of this analysis highlight the influence of data augmentation for DL methods. Furthermore, findings indicate a strong correlation between binarization effectiveness on papyri documents of DIBCO 2019 and downstream writer identification performance.
Dominic Akt、Marco Peer、Florian Kleber
语言学
Dominic Akt,Marco Peer,Florian Kleber.Assessing the impact of Binarization for Writer Identification in Greek Papyrus[EB/OL].(2025-06-18)[2025-07-03].https://arxiv.org/abs/2506.15852.点此复制
评论