|国家预印本平台
首页|Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges

Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges

Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges

来源:Arxiv_logoArxiv
英文摘要

Large Language Models (LLMs) still struggle with the structured reasoning and tool-assisted computation needed for problem solving in cybersecurity applications. In this work, we introduce "random-crypto", a cryptographic Capture-the-Flag (CTF) challenge generator framework that we use to fine-tune a tool-augmented Llama-3.1-8B with Guided Reinforcement Prompt Optimisation (GRPO), allowing the agent to iteratively write and execute Python inside an isolated REPL. GRPO yields a +53% absolute jump in Pass@8 on unseen "random-crypto" tasks (0.35 -> 0.88) and raises Majority@8 to 0.41. The fine-tuned agent also generalizes to an external dataset. On a subset of picoCTF cryptography problems, it improves Pass@8 by +13 pp. Ablations show the gains stem from more reliable tool invocation and code synthesis, rather than superficial prompt adaptation.

Lajos Muzsai、David Imolai、András Lukács

计算技术、计算机技术

Lajos Muzsai,David Imolai,András Lukács.Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges[EB/OL].(2025-05-31)[2025-06-24].https://arxiv.org/abs/2506.02048.点此复制

评论