Adaptive Two Sided Laplace Transforms: A Learnable, Interpretable, and Scalable Replacement for Self-Attention
Adaptive Two Sided Laplace Transforms: A Learnable, Interpretable, and Scalable Replacement for Self-Attention
We propose an innovative, learnable two-sided short-time Laplace transform (STLT) mechanism to supplant the traditional self attention in transformer-based LLMs. Our STLT introduces trainable parameters for each Laplace node, enabling end-to-end learning of decay rates , oscillatory frequencies, and window bandwidth T. This flexibility allows the model to dynamically adapt token relevance half lives and frequency responses during training. By selecting S learnable nodes and leveraging fast recursive convolution, we achieve an effective complexity of in time and memory. We further incorporate an efficient FFT-based computation of the relevance matrix and an adaptive node allocation mechanism to dynamically adjust the number of active Laplace nodes. Empirical results on language modeling (WikiText\-103, Project Gutenberg), machine translation (WMT'14 En\-De), and long document question answering (NarrativeQA) demonstrate that our learnable STLT achieves perplexities and scores on par with or better than existing efficient transformers while naturally extending to context lengths exceeding 100k tokens or more limited only by available hardware. Ablation studies confirm the importance of learnable parameters and adaptive node allocation. The proposed approach combines interpretability, through explicit decay and frequency parameters, with scalability and robustness, offering a pathway towards ultra-long-sequence language modeling without the computational bottleneck of self-attention.
Andrew Kiruluta
计算技术、计算机技术自动化技术、自动化技术设备
Andrew Kiruluta.Adaptive Two Sided Laplace Transforms: A Learnable, Interpretable, and Scalable Replacement for Self-Attention[EB/OL].(2025-06-01)[2025-07-16].https://arxiv.org/abs/2506.15714.点此复制
评论