首页|ROSA: Addressing text understanding challenges in photographs via ROtated SAmpling

ROSA: Addressing text understanding challenges in photographs via ROtated SAmpling

来源：

英文摘要

Visually impaired people could benefit from Visual Question Answering (VQA) systems to interpret text in their surroundings. However, current models often struggle with recognizing text in the photos taken by this population. Through in-depth interviews with visually impaired individuals, we identified common framing conventions that frequently result in misaligned text. Existing VQA benchmarks primarily feature well-oriented text captured by sighted users, under-representing these challenges. To address this gap, we introduce ROtated SAmpling (ROSA), a decoding strategy that enhances VQA performance in text-rich images with incorrectly oriented text. ROSA outperforms Greedy decoding by 11.7 absolute points in the best-performing model.

作者：Hernán Maina、Guido Ivetta、Mateo Lione Stuto、Julian Martin Eisenschlos、Jorge Sánchez、Luciana Benotti

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Hernán Maina,Guido Ivetta,Mateo Lione Stuto,Julian Martin Eisenschlos,Jorge Sánchez,Luciana Benotti.ROSA: Addressing text understanding challenges in photographs via ROtated SAmpling[EB/OL].(2025-06-04)[2025-07-16].https://arxiv.org/abs/2506.03665.点此复制

ROSA: Addressing text understanding challenges in photographs via ROtated SAmpling

ROSA: Addressing text understanding challenges in photographs via ROtated SAmpling

评论