首页|Image Recognition with Online Lightweight Vision Transformer: A Survey

Image Recognition with Online Lightweight Vision Transformer: A Survey

来源：

英文摘要

The Transformer architecture has achieved significant success in natural language processing, motivating its adaptation to computer vision tasks. Unlike convolutional neural networks, vision transformers inherently capture long-range dependencies and enable parallel processing, yet lack inductive biases and efficiency benefits, facing significant computational and memory challenges that limit its real-world applicability. This paper surveys various online strategies for generating lightweight vision transformers for image recognition, focusing on three key areas: Efficient Component Design, Dynamic Network, and Knowledge Distillation. We evaluate the relevant exploration for each topic on the ImageNet-1K benchmark, analyzing trade-offs among precision, parameters, throughput, and more to highlight their respective advantages, disadvantages, and flexibility. Finally, we propose future research directions and potential challenges in the lightweighting of vision transformers with the aim of inspiring further exploration and providing practical guidance for the community. Project Page: https://github.com/ajxklo/Lightweight-VIT

作者：Rongtao Xu、Jie Zhou、Changwei Wang、Xingtian Pei、Wenhao Xu、Jiguang Zhang、Li Guo、Longxiang Gao、Wenbo Xu、Shibiao Xu、Zherui Zhang

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Rongtao Xu,Jie Zhou,Changwei Wang,Xingtian Pei,Wenhao Xu,Jiguang Zhang,Li Guo,Longxiang Gao,Wenbo Xu,Shibiao Xu,Zherui Zhang.Image Recognition with Online Lightweight Vision Transformer: A Survey[EB/OL].(2025-05-05)[2025-05-23].https://arxiv.org/abs/2505.03113.点此复制

Image Recognition with Online Lightweight Vision Transformer: A Survey

Image Recognition with Online Lightweight Vision Transformer: A Survey

评论