|国家预印本平台
首页|A Theoretical Framework for OOD Robustness in Transformers using Gevrey Classes

A Theoretical Framework for OOD Robustness in Transformers using Gevrey Classes

A Theoretical Framework for OOD Robustness in Transformers using Gevrey Classes

来源:Arxiv_logoArxiv
英文摘要

We study the robustness of Transformer language models under semantic out-of-distribution (OOD) shifts, where training and test data lie in disjoint latent spaces. Using Wasserstein-1 distance and Gevrey-class smoothness, we derive sub-exponential upper bounds on prediction error. Our theoretical framework explains how smoothness governs generalization under distributional drift. We validate these findings through controlled experiments on arithmetic and Chain-of-Thought tasks with latent permutations and scalings. Results show empirical degradation aligns with our bounds, highlighting the geometric and functional principles underlying OOD generalization in Transformers.

Yu Wang、Fu-Chieh Chang、Pei-Yuan Wu

数学计算技术、计算机技术

Yu Wang,Fu-Chieh Chang,Pei-Yuan Wu.A Theoretical Framework for OOD Robustness in Transformers using Gevrey Classes[EB/OL].(2025-04-17)[2025-06-07].https://arxiv.org/abs/2504.12991.点此复制

评论