Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning
Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning
Deep neural networks have achieved remarkable success across various applications; however, their vulnerability to backdoor attacks poses severe security risks -- especially in situations where only a limited set of clean samples is available for defense. In this work, we address this critical challenge by proposing ULRL (UnLearn and ReLearn for backdoor removal), a novel two-phase approach for comprehensive backdoor removal. Our method first employs an unlearning phase, in which the network's loss is intentionally maximized on a small clean dataset to expose neurons that are excessively sensitive to backdoor triggers. Subsequently, in the relearning phase, these suspicious neurons are recalibrated using targeted reinitialization and cosine similarity regularization, effectively neutralizing backdoor influences while preserving the model's performance on benign data. Extensive experiments with 12 backdoor types on multiple datasets (CIFAR-10, CIFAR-100, GTSRB, and Tiny-ImageNet) and architectures (PreAct-ResNet18, VGG19-BN, and ViT-B-16) demonstrate that ULRL significantly reduces the attack success rate without compromising clean accuracy -- even when only 1% of clean data is used for defense.
Jun Sun、Nay Myat Min、Long H. Pham
计算技术、计算机技术
Jun Sun,Nay Myat Min,Long H. Pham.Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning[EB/OL].(2025-06-24)[2025-07-16].https://arxiv.org/abs/2405.14781.点此复制
评论