Generalization vs. Specialization under Concept Shift
Generalization vs. Specialization under Concept Shift
Machine learning models are often brittle under distribution shift, i.e., when data distributions at test time differ from those during training. Understanding this failure mode is central to identifying and mitigating safety risks of mass adoption of machine learning. Here we analyze ridge regression under concept shift -- a form of distribution shift in which the input-label relationship changes at test time. We derive an exact expression for prediction risk in the thermodynamic limit. Our results reveal nontrivial effects of concept shift on generalization performance, including a phase transition between weak and strong concept shift regimes and nonmonotonic data dependence of test performance even when double descent is absent. Our theoretical results are in good agreement with experiments based on transformers pretrained to solve linear regression; under concept shift, too long context length can be detrimental to generalization performance of next token prediction. Finally, our experiments on MNIST and FashionMNIST suggest that this intriguing behavior is present also in classification problems.
Alex Nguyen、David J. Schwab、Vudtiwat Ngampruetikorn
计算技术、计算机技术
Alex Nguyen,David J. Schwab,Vudtiwat Ngampruetikorn.Generalization vs. Specialization under Concept Shift[EB/OL].(2025-07-03)[2025-07-16].https://arxiv.org/abs/2409.15582.点此复制
评论