|国家预印本平台
首页|Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images

Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images

Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images

来源:Arxiv_logoArxiv
英文摘要

Modern AI models excel in controlled settings but often fail in real-world scenarios where data distributions shift unpredictably - a challenge known as domain generalisation (DG). This paper tackles this limitation by rigorously evaluating vision tramsformers, specifically the BEIT architecture which is a model pre-trained with masked image modelling (MIM), against synthetic out-of-distribution (OOD) benchmarks designed to mimic real-world noise and occlusions. We introduce a novel framework to generate OOD test cases by strategically masking object regions in images using grid patterns (25\%, 50\%, 75\% occlusion) and leveraging cutting-edge zero-shot segmentation via Segment Anything and Grounding DINO to ensure precise object localisation. Experiments across three benchmarks (PACS, Office-Home, DomainNet) demonstrate BEIT's known robustness while maintaining 94\% accuracy on PACS and 87\% on Office-Home, despite significant occlusions, outperforming CNNs and other vision transformers by margins of up to 37\%. Analysis of self-attention distances reveals that the BEIT dependence on global features correlates with its resilience. Furthermore, our synthetic benchmarks expose critical failure modes: performance degrades sharply when occlusions disrupt object shapes e.g. 68\% drop for external grid masking vs. 22\% for internal masking. This work provides two key advances (1) a scalable method to generate OOD benchmarks using controllable noise, and (2) empirical evidence that MIM and self-attention mechanism in vision transformers enhance DG by learning invariant features. These insights bridge the gap between lab-trained models and real-world deployment that offer a blueprint for building AI systems that generalise reliably under uncertainty.

Hamza Riaz、Alan F. Smeaton

计算技术、计算机技术

Hamza Riaz,Alan F. Smeaton.Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images[EB/OL].(2025-04-05)[2025-05-15].https://arxiv.org/abs/2504.04225.点此复制

评论