首页|Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images

Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images

来源：

英文摘要

Modern AI models excel in controlled settings but often fail in real-world scenarios where data distributions shift unpredictably - a challenge known as domain generalisation (DG). This paper tackles this limitation by rigorously evaluating vision tramsformers, specifically the BEIT architecture which is a model pre-trained with masked image modelling (MIM), against synthetic out-of-distribution (OOD) benchmarks designed to mimic real-world noise and occlusions. We introduce a novel framework to generate OOD test cases by strategically masking object regions in images using grid patterns (25\%, 50\%, 75\% occlusion) and leveraging cutting-edge zero-shot segmentation via Segment Anything and Grounding DINO to ensure precise object localisation. Experiments across three benchmarks (PACS, Office-Home, DomainNet) demonstrate BEIT's known robustness while maintaining 94\% accuracy on PACS and 87\% on Office-Home, despite significant occlusions, outperforming CNNs and other vision transformers by margins of up to 37\%. Analysis of self-attention distances reveals that the BEIT dependence on global features correlates with its resilience. Furthermore, our synthetic benchmarks expose critical failure modes: performance degrades sharply when occlusions disrupt object shapes e.g. 68\% drop for external grid masking vs. 22\% for internal masking. This work provides two key advances (1) a scalable method to generate OOD benchmarks using controllable noise, and (2) empirical evidence that MIM and self-attention mechanism in vision transformers enhance DG by learning invariant features. These insights bridge the gap between lab-trained models and real-world deployment that offer a blueprint for building AI systems that generalise reliably under uncertainty.

作者：Hamza Riaz、Alan F. Smeaton

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Hamza Riaz,Alan F. Smeaton.Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images[EB/OL].(2025-04-05)[2025-05-15].https://arxiv.org/abs/2504.04225.点此复制

Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images

Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images

评论