首页|Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack

Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack

来源：

英文摘要

We extend BeamAttack, an adversarial attack algorithm designed to evaluate the robustness of text classification systems through word-level modifications guided by beam search. Our extensions include support for word deletions and the option to skip substitutions, enabling the discovery of minimal modifications that alter model predictions. We also integrate LIME to better prioritize word replacements. Evaluated across multiple datasets and victim models (BiLSTM, BERT, and adversarially trained RoBERTa) within the BODEGA framework, our approach achieves over a 99\% attack success rate while preserving the semantic and lexical similarity of the original texts. Through both quantitative and qualitative analysis, we highlight BeamAttack's effectiveness and its limitations. Our implementation is available at https://github.com/LucK1Y/BeamAttack

作者：Arnisa Fazla、Lucas Krauter、David Guzman Piedrahita、Andrianos Michail

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Arnisa Fazla,Lucas Krauter,David Guzman Piedrahita,Andrianos Michail.Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack[EB/OL].(2025-07-03)[2025-07-16].https://arxiv.org/abs/2506.23661.点此复制

Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack

Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack

评论