Clarifying orthography: Orthographic transparency as compressibility
Clarifying orthography: Orthographic transparency as compressibility
Orthographic transparency -- how directly spelling is related to sound -- lacks a unified, script-agnostic metric. Using ideas from algorithmic information theory, we quantify orthographic transparency in terms of the mutual compressibility between orthographic and phonological strings. Our measure provides a principled way to combine two factors that decrease orthographic transparency, capturing both irregular spellings and rule complexity in one quantity. We estimate our transparency measure using prequential code-lengths derived from neural sequence models. Evaluating 22 languages across a broad range of script types (alphabetic, abjad, abugida, syllabic, logographic) confirms common intuitions about relative transparency of scripts. Mutual compressibility offers a simple, principled, and general yardstick for orthographic transparency.
Charles J. Torres、Richard Futrell
语言学
Charles J. Torres,Richard Futrell.Clarifying orthography: Orthographic transparency as compressibility[EB/OL].(2025-05-19)[2025-07-09].https://arxiv.org/abs/2505.13657.点此复制
评论