Learning to Focus: Context Extraction for Efficient Code Vulnerability Detection with Language Models
Learning to Focus: Context Extraction for Efficient Code Vulnerability Detection with Language Models
Language models (LMs) show promise for vulnerability detection but struggle with long, real-world code due to sparse and uncertain vulnerability locations. These issues, exacerbated by token limits, often cause models to miss vulnerability-related signals, thereby impairing effective learning. A key intuition is to enhance LMs with concise, information-rich context. Commit-based annotations offer precise, CWE-agnostic supervision, but are unavailable during inference, as they depend on historical code changes. Moreover, their extreme sparsity, often covering only a few lines, makes it difficult for LMs to process directly. In this paper, we propose FocusVul, a model-agnostic framework that improves LM-based vulnerability detection by learning to select sensitive context. FocusVul learns commit-based annotation patterns through hierarchical semantic modeling and generalizes them to identify line-level vulnerability-relevant regions during inference. It then extracts LM-oriented context via both dependency and execution flows surrounding selected regions, yielding semantically rich inputs for effective vulnerability detection. Experiments on real-world benchmarks show that FocusVul consistently outperforms heuristic-based and full-function fine-tuning approaches, improving classification performance by 164.04% and reducing FLOPs by 19.12% on average.
Shuo Yang、Yiling He、Suman Jana、Lorenzo Cavallaro、Huichi Zhou、Xinran Zheng、Xingzhi Qian
计算技术、计算机技术
Shuo Yang,Yiling He,Suman Jana,Lorenzo Cavallaro,Huichi Zhou,Xinran Zheng,Xingzhi Qian.Learning to Focus: Context Extraction for Efficient Code Vulnerability Detection with Language Models[EB/OL].(2025-07-08)[2025-07-16].https://arxiv.org/abs/2505.17460.点此复制
评论