首页|Towards Effective Complementary Security Analysis using Large Language Models

Towards Effective Complementary Security Analysis using Large Language Models

来源：

英文摘要

A key challenge in security analysis is the manual evaluation of potential security weaknesses generated by static application security testing (SAST) tools. Numerous false positives (FPs) in these reports reduce the effectiveness of security analysis. We propose using Large Language Models (LLMs) to improve the assessment of SAST findings. We investigate the ability of LLMs to reduce FPs while trying to maintain a perfect true positive rate, using datasets extracted from the OWASP Benchmark (v1.2) and a real-world software project. Our results indicate that advanced prompting techniques, such as Chain-of-Thought and Self-Consistency, substantially improve FP detection. Notably, some LLMs identified approximately 62.5% of FPs in the OWASP Benchmark dataset without missing genuine weaknesses. Combining detections from different LLMs would increase this FP detection to approximately 78.9%. Additionally, we demonstrate our approach's generalizability using a real-world dataset covering five SAST tools, three programming languages, and infrastructure files. The best LLM detected 33.85% of all FPs without missing genuine weaknesses, while combining detections from different LLMs would increase this detection to 38.46%. Our findings highlight the potential of LLMs to complement traditional SAST tools, enhancing automation and reducing resources spent addressing false alarms.

作者：Jonas Wagner、Simon M??ller、Christian N?¤ther、Jan-Philipp Stegh??fer、Andreas Both

作者单位：

学科分类：安全科学自动化技术、自动化技术设备计算技术、计算机技术

推荐引用：Jonas Wagner,Simon M??ller,Christian N?¤ther,Jan-Philipp Stegh??fer,Andreas Both.Towards Effective Complementary Security Analysis using Large Language Models[EB/OL].(2025-06-20)[2025-06-30].https://arxiv.org/abs/2506.16899.点此复制

Towards Effective Complementary Security Analysis using Large Language Models

Towards Effective Complementary Security Analysis using Large Language Models

评论