|国家预印本平台
首页|Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities

Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities

Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities

来源:Arxiv_logoArxiv
英文摘要

As Large Language Models (LLMs) become increasingly integrated into real-world decision-making systems, understanding their behavioural vulnerabilities remains a critical challenge for AI safety and alignment. While existing evaluation metrics focus primarily on reasoning accuracy or factual correctness, they often overlook whether LLMs are robust to adversarial manipulation or capable of using adaptive strategy in dynamic environments. This paper introduces an adversarial evaluation framework designed to systematically stress-test the decision-making processes of LLMs under interactive and adversarial conditions. Drawing on methodologies from cognitive psychology and game theory, our framework probes how models respond in two canonical tasks: the two-armed bandit task and the Multi-Round Trust Task. These tasks capture key aspects of exploration-exploitation trade-offs, social cooperation, and strategic flexibility. We apply this framework to several state-of-the-art LLMs, including GPT-3.5, GPT-4, Gemini-1.5, and DeepSeek-V3, revealing model-specific susceptibilities to manipulation and rigidity in strategy adaptation. Our findings highlight distinct behavioral patterns across models and emphasize the importance of adaptability and fairness recognition for trustworthy AI deployment. Rather than offering a performance benchmark, this work proposes a methodology for diagnosing decision-making weaknesses in LLM-based agents, providing actionable insights for alignment and safety research.

Lili Zhang、Haomiaomiao Wang、Long Cheng、Libao Deng、Tomas Ward

计算技术、计算机技术

Lili Zhang,Haomiaomiao Wang,Long Cheng,Libao Deng,Tomas Ward.Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities[EB/OL].(2025-05-19)[2025-07-16].https://arxiv.org/abs/2505.13195.点此复制

评论