|国家预印本平台
首页|Multilingual Pretraining for Pixel Language Models

Multilingual Pretraining for Pixel Language Models

Multilingual Pretraining for Pixel Language Models

来源:Arxiv_logoArxiv
英文摘要

Pixel language models operate directly on images of rendered text, eliminating the need for a fixed vocabulary. While these models have demonstrated strong capabilities for downstream cross-lingual transfer, multilingual pretraining remains underexplored. We introduce PIXEL-M4, a model pretrained on four visually and linguistically diverse languages: English, Hindi, Ukrainian, and Simplified Chinese. Multilingual evaluations on semantic and syntactic tasks show that PIXEL-M4 outperforms an English-only counterpart on non-Latin scripts. Word-level probing analyses confirm that PIXEL-M4 captures rich linguistic features, even in languages not seen during pretraining. Furthermore, an analysis of its hidden representations shows that multilingual pretraining yields a semantic embedding space closely aligned across the languages used for pretraining. This work demonstrates that multilingual pretraining substantially enhances the capability of pixel language models to effectively support a diverse set of languages.

Ilker Kesen、Jonas F. Lotz、Ingo Ziegler、Phillip Rust、Desmond Elliott

印欧语系汉藏语系

Ilker Kesen,Jonas F. Lotz,Ingo Ziegler,Phillip Rust,Desmond Elliott.Multilingual Pretraining for Pixel Language Models[EB/OL].(2025-05-27)[2025-06-19].https://arxiv.org/abs/2505.21265.点此复制

评论