|国家预印本平台
首页|Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning

Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning

Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning

来源:Arxiv_logoArxiv
英文摘要

In this study, we introduce KOPL, a novel framework for handling Korean OOV words with Phoneme representation Learning. Our work is based on the linguistic property of Korean as a phonemic script, the high correlation between phonemes and letters. KOPL incorporates phoneme and word representations for Korean OOV words, facilitating Korean OOV word representations to capture both text and phoneme information of words. We empirically demonstrate that KOPL significantly improves the performance on Korean Natural Language Processing (NLP) tasks, while being readily integrated into existing static and contextual Korean embedding models in a plug-and-play manner. Notably, we show that KOPL outperforms the state-of-the-art model by an average of 1.9%. Our code is available at https://github.com/jej127/KOPL.git.

Nayeon Kim、Eojin Jeon、Jun-Hyung Park、SangKeun Lee

10.1007/978-981-96-8180-8_38

东北亚诸语言计算技术、计算机技术

Nayeon Kim,Eojin Jeon,Jun-Hyung Park,SangKeun Lee.Handling Korean Out-of-Vocabulary Words with Phoneme Representation Learning[EB/OL].(2025-07-05)[2025-07-21].https://arxiv.org/abs/2507.04018.点此复制

评论