首页|AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings

AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings

来源：

英文摘要

Current language models rely on static vocabularies determined at pretraining time, which can lead to decreased performance and increased computational cost for domains underrepresented in the original vocabulary. New tokens can be added to solve this problem, when coupled with a good initialization for their new embeddings. However, existing embedding initialization methods either require expensive further training or pretraining of additional modules. In this paper, we propose AweDist and show that by distilling representations obtained using the original tokenization, we can quickly learn high-quality input embeddings for new tokens. Experimental results with a wide range of open-weight models show that AweDist is able to outperform even strong baselines.

作者：Konstantin Dobler、Desmond Elliott、Gerard de Melo

作者单位：

学科分类：语言学

推荐引用：Konstantin Dobler,Desmond Elliott,Gerard de Melo.AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings[EB/OL].(2025-05-26)[2025-07-18].https://arxiv.org/abs/2505.20133.点此复制

AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings

AweDist: Attention-aware Embedding Distillation for New Input Token Embeddings

评论