基于激活特征的对抗样本检测方法
dversarial example detection method based on activation feature
深度神经网络给人类带来了极大的便利,但又因其自身的脆弱性,在使用过程中存在着极高的安全风险。攻击者可以在输入样本中添加微小的恶意扰动来诱导分类器错误分类。因此,检测输入样本中是否存在对抗扰动是鲁棒分类框架中的基本要求。本文提出了一种基于激活特征的对抗样本检测方法。该方法的关键思想是与干净样本相比,对抗样本在深度神经网络内部的前向传播过程中存在异常趋势,可以通过提取该异常趋势来检测是否为对抗样本。该方法提取分类器中间层的激活特征,通过模型处理统一维度后拼接在一起,再通过长短期记忆网络自动提取用于识别干净样本与对抗样本的特征,具有较好的可解释性。本文在CIFAR-10标准数据集上进行了实验,使用可视化方法具体分析了该检测模型的工作原理。
eep neural networks have brought great convenience to human beings, but due to their own fragility, there are extremely high security risks in the use process. An attacker can add tiny malicious perturbations to the input examples to induce the classifier to misclassify. Detection of adversarial examples is, therefore, a fundamental requirement for robust classification frameworks. This paper proposes an adversarial example detection method based on activation features. The key idea of this method is that compared with clean examples, adversarial examples have an abnormal trend in the forward propagation process inside the deep neural network, which can be used to detect adversarial examples. This method extracts the activation features of the intermediate layer of the classifier, processes the unified dimensions through the model and then splices them together, and then automatically extracts the features used to identify clean examples and adversarial examples through the long short-term memory network, which has good interpretability. This paper conducts experiments on the CIFAR-10 standard dataset, and uses visualization methods to specifically analyze the working principle of the detection model.
李天祥
计算技术、计算机技术
深度神经网络对抗样本对抗检测
deep neural networkadversarial exampleadversarial detection
李天祥.基于激活特征的对抗样本检测方法[EB/OL].(2022-11-23)[2025-08-21].http://www.paper.edu.cn/releasepaper/content/202211-61.点此复制
评论