Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
计算技术、计算机技术
Yuchen Ren,Zhengyu Zhao,Chenhao Lin,Bo Yang,Lu Zhou,Zhe Liu,Chao Shen.Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement[EB/OL].(2025-03-19)[2025-10-25].https://arxiv.org/abs/2503.15404.点此复制
Vision Transformers (ViTs) have been widely applied in various computer
vision and vision-language tasks. To gain insights into their robustness in
practical scenarios, transferable adversarial examples on ViTs have been
extensively studied. A typical approach to improving adversarial
transferability is by refining the surrogate model. However, existing work on
ViTs has restricted their surrogate refinement to backward propagation. In this
work, we instead focus on Forward Propagation Refinement (FPR) and specifically
refine two key modules of ViTs: attention maps and token embeddings. For
attention maps, we propose Attention Map Diversification (AMD), which
diversifies certain attention maps and also implicitly imposes beneficial
gradient vanishing during backward propagation. For token embeddings, we
propose Momentum Token Embedding (MTE), which accumulates historical token
embeddings to stabilize the forward updates in both the Attention and MLP
blocks. We conduct extensive experiments with adversarial examples transferred
from ViTs to various CNNs and ViTs, demonstrating that our FPR outperforms the
current best (backward) surrogate refinement by up to 7.0\% on average. We also
validate its superiority against popular defenses and its compatibility with
other transfer methods. Codes and appendix are available at
https://github.com/RYC-98/FPR.
展开英文信息

评论