|国家预印本平台
首页|Towards Robust LLMs: an Adversarial Robustness Measurement Framework

Towards Robust LLMs: an Adversarial Robustness Measurement Framework

Towards Robust LLMs: an Adversarial Robustness Measurement Framework

来源:Arxiv_logoArxiv
英文摘要

The rise of Large Language Models (LLMs) has revolutionized artificial intelligence, yet these models remain vulnerable to adversarial perturbations, undermining their reliability in high-stakes applications. While adversarial robustness in vision-based neural networks has been extensively studied, LLM robustness remains under-explored. We adapt the Robustness Measurement and Assessment (RoMA) framework to quantify LLM resilience against adversarial inputs without requiring access to model parameters. By comparing RoMA's estimates to those of formal verification methods, we demonstrate its accuracy with minimal error margins while maintaining computational efficiency. Our empirical evaluation reveals that robustness varies significantly not only between different models but also across categories within the same task and between various types of perturbations. This non-uniformity underscores the need for task-specific robustness evaluations, enabling practitioners to compare and select models based on application-specific robustness requirements. Our work provides a systematic methodology to assess LLM robustness, advancing the development of more reliable language models for real-world deployment.

Guy Katz、Natan Levy、Adiel Ashrov

计算技术、计算机技术

Guy Katz,Natan Levy,Adiel Ashrov.Towards Robust LLMs: an Adversarial Robustness Measurement Framework[EB/OL].(2025-04-24)[2025-05-04].https://arxiv.org/abs/2504.17723.点此复制

评论