|国家预印本平台
首页|Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers

Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers

Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers

来源:Arxiv_logoArxiv
英文摘要

We propose a novel framework, Continuous_Time Attention, which infuses partial differential equations (PDEs) into the Transformer's attention mechanism to address the challenges of extremely long input sequences. Instead of relying solely on a static attention matrix, we allow attention weights to evolve over a pseudo_time dimension via diffusion, wave, or reaction_diffusion dynamics. This mechanism systematically smooths local noise, enhances long_range dependencies, and stabilizes gradient flow. Theoretically, our analysis shows that PDE_based attention leads to better optimization landscapes and polynomial rather than exponential decay of distant interactions. Empirically, we benchmark our method on diverse experiments_demonstrating consistent gains over both standard and specialized long sequence Transformer variants. Our findings highlight the potential of PDE_based formulations to enrich attention mechanisms with continuous_time dynamics and global coherence.

Yukun Zhang、Xueqing Zhou

计算技术、计算机技术

Yukun Zhang,Xueqing Zhou.Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers[EB/OL].(2025-05-26)[2025-06-07].https://arxiv.org/abs/2505.20666.点此复制

评论