|国家预印本平台
首页|Shared Global and Local Geometry of Language Model Embeddings

Shared Global and Local Geometry of Language Model Embeddings

Shared Global and Local Geometry of Language Model Embeddings

来源:Arxiv_logoArxiv
英文摘要

Researchers have recently suggested that models share common representations. In our work, we find numerous geometric similarities across the token embeddings of large language models. First, we find ``global'' similarities: token embeddings often share similar relative orientations. Next, we characterize local geometry in two ways: (1) by using Locally Linear Embeddings, and (2) by defining a simple measure for the intrinsic dimension of each embedding. Both characterizations allow us to find local similarities across token embeddings. Additionally, our intrinsic dimension demonstrates that embeddings lie on a lower dimensional manifold, and that tokens with lower intrinsic dimensions often have semantically coherent clusters, while those with higher intrinsic dimensions do not. Based on our findings, we introduce EMB2EMB, a simple application to linearly transform steering vectors from one language model to another, despite the two models having different dimensions.

Andrew Lee、Melanie Weber、Fernanda Viégas、Martin Wattenberg

语言学

Andrew Lee,Melanie Weber,Fernanda Viégas,Martin Wattenberg.Shared Global and Local Geometry of Language Model Embeddings[EB/OL].(2025-07-15)[2025-07-23].https://arxiv.org/abs/2503.21073.点此复制

评论