|国家预印本平台
首页|PaddleOCR 3.0 Technical Report

PaddleOCR 3.0 Technical Report

PaddleOCR 3.0 Technical Report

来源:Arxiv_logoArxiv
英文摘要

This technical report introduces PaddleOCR 3.0, an Apache-licensed open-source toolkit for OCR and document parsing. To address the growing demand for document understanding in the era of large language models, PaddleOCR 3.0 presents three major solutions: (1) PP-OCRv5 for multilingual text recognition, (2) PP-StructureV3 for hierarchical document parsing, and (3) PP-ChatOCRv4 for key information extraction. Compared to mainstream vision-language models (VLMs), these models with fewer than 100 million parameters achieve competitive accuracy and efficiency, rivaling billion-parameter VLMs. In addition to offering a high-quality OCR model library, PaddleOCR 3.0 provides efficient tools for training, inference, and deployment, supports heterogeneous hardware acceleration, and enables developers to easily build intelligent document applications.

Cheng Cui、Ting Sun、Manhui Lin、Tingquan Gao、Yubo Zhang、Jiaxuan Liu、Xueqing Wang、Zelun Zhang、Changda Zhou、Hongen Liu、Yue Zhang、Wenyu Lv、Kui Huang、Yichao Zhang、Jing Zhang、Jun Zhang、Yi Liu、Dianhai Yu、Yanjun Ma

计算技术、计算机技术

Cheng Cui,Ting Sun,Manhui Lin,Tingquan Gao,Yubo Zhang,Jiaxuan Liu,Xueqing Wang,Zelun Zhang,Changda Zhou,Hongen Liu,Yue Zhang,Wenyu Lv,Kui Huang,Yichao Zhang,Jing Zhang,Jun Zhang,Yi Liu,Dianhai Yu,Yanjun Ma.PaddleOCR 3.0 Technical Report[EB/OL].(2025-07-08)[2025-07-21].https://arxiv.org/abs/2507.05595.点此复制

评论