|国家预印本平台
首页|Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation

Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation

Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation

来源:Arxiv_logoArxiv
英文摘要

We demonstrate a new approach to Neural Machine Translation (NMT) for low-resource languages using a ubiquitous linguistic resource, Interlinear Glossed Text (IGT). IGT represents a non-English sentence as a sequence of English lemmas and morpheme labels. As such, it can serve as a pivot or interlingua for NMT. Our contribution is four-fold. Firstly, we pool IGT for 1,497 languages in ODIN (54,545 glosses) and 70,918 glosses in Arapaho and train a gloss-to-target NMT system from IGT to English, with a BLEU score of 25.94. We introduce a multilingual NMT model that tags all glossed text with gloss-source language tags and train a universal system with shared attention across 1,497 languages. Secondly, we use the IGT gloss-to-target translation as a key step in an English-Turkish MT system trained on only 865 lines from ODIN. Thirdly, we we present five metrics for evaluating extremely low-resource translation when BLEU is no longer sufficient and evaluate the Turkish low-resource system using BLEU and also using accuracy of matching nouns, verbs, agreement, tense, and spurious repetition, showing large improvements.

David R. Mortensen、Lori Levin、Zhong Zhou、Alex Waibel

语言学印欧语系

David R. Mortensen,Lori Levin,Zhong Zhou,Alex Waibel.Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation[EB/OL].(2019-11-06)[2025-07-21].https://arxiv.org/abs/1911.02709.点此复制

评论