|国家预印本平台
首页|An extension of linear self-attention for in-context learning

An extension of linear self-attention for in-context learning

An extension of linear self-attention for in-context learning

来源:Arxiv_logoArxiv
英文摘要

In-context learning is a remarkable property of transformers and has been the focus of recent research. An attention mechanism is a key component in transformers, in which an attention matrix encodes relationships between words in a sentence and is used as weights for words in a sentence. This mechanism is effective for capturing language representations. However, it is questionable whether naive self-attention is suitable for in-context learning in general tasks, since the computation implemented by self-attention is somewhat restrictive in terms of matrix multiplication. In fact, we may need appropriate input form designs when considering heuristic implementations of computational algorithms. In this paper, in case of linear self-attention, we extend it by introducing a bias matrix in addition to a weight matrix for an input. Despite the simple extension, the extended linear self-attention can output any constant matrix, input matrix and multiplications of two or three matrices in the input. Note that the second property implies that it can be a skip connection. Therefore, flexible matrix manipulations can be implemented by connecting the extended linear self-attention components. As an example of implementation using the extended linear self-attention, we show a heuristic construction of a batch-type gradient descent of ridge regression under a reasonable input form.

Katsuyuki Hagiwara

计算技术、计算机技术自动化基础理论

Katsuyuki Hagiwara.An extension of linear self-attention for in-context learning[EB/OL].(2025-03-31)[2025-04-29].https://arxiv.org/abs/2503.23814.点此复制

评论