Two Spelling Normalization Approaches Based on Large Language Models
Two Spelling Normalization Approaches Based on Large Language Models
The absence of standardized spelling conventions and the organic evolution of human language present an inherent linguistic challenge within historical documents, a longstanding concern for scholars in the humanities. Addressing this issue, spelling normalization endeavors to align a document's orthography with contemporary standards. In this study, we propose two new approaches based on large language models: one of which has been trained without a supervised training, and a second one which has been trained for machine translation. Our evaluation spans multiple datasets encompassing diverse languages and historical periods, leading us to the conclusion that while both of them yielded encouraging results, statistical machine translation still seems to be the most suitable technology for this task.
Miguel Domingo、Francisco Casacuberta
语言学
Miguel Domingo,Francisco Casacuberta.Two Spelling Normalization Approaches Based on Large Language Models[EB/OL].(2025-06-29)[2025-07-19].https://arxiv.org/abs/2506.23288.点此复制
评论