首页|Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

来源：

英文摘要

Automated parsing of scanned documents into richly structured, machine-readable formats remains a critical bottleneck in Document AI, as traditional multi-stage pipelines suffer from error propagation and limited adaptability to diverse layouts. We introduce layoutRL, an end-to-end reinforcement learning framework that trains models to be explicitly layout-aware by optimizing a composite reward of normalized edit distance, paragraph count accuracy, and reading order preservation. Leveraging our newly released dataset, Infinity-Doc-55K, which combines 55K high-fidelity synthetic scanned document parsing data with expert-filtered real-world documents, we instantiate layoutRL in a vision-language-model-based parser called Infinity-Parser. Evaluated on English and Chinese benchmarks for OCR, table and formula extraction, and reading order detection, Infinity-Parser achieves new state-of-the-art performance in both accuracy and structural fidelity, outpacing specialist pipelines and general-purpose vision-language models. We will publicly release our code and dataset to accelerate progress in robust document understanding.

作者：Baode Wang、Biao Wu、Weizhen Li、Meng Fang、Yanjie Liang、Zuming Huang、Haozhe Wang、Jun Huang、Ling Chen、Wei Chu、Yuan Qi

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Baode Wang,Biao Wu,Weizhen Li,Meng Fang,Yanjie Liang,Zuming Huang,Haozhe Wang,Jun Huang,Ling Chen,Wei Chu,Yuan Qi.Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing[EB/OL].(2025-06-01)[2025-07-21].https://arxiv.org/abs/2506.03197.点此复制

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing

评论