首页|Automatically Generating Rules of Malicious Software Packages via Large Language Model

Automatically Generating Rules of Malicious Software Packages via Large Language Model

来源：

英文摘要

Today's security tools predominantly rely on predefined rules crafted by experts, making them poorly adapted to the emergence of software supply chain attacks. To tackle this limitation, we propose a novel tool, RuleLLM, which leverages large language models (LLMs) to automate rule generation for OSS ecosystems. RuleLLM extracts metadata and code snippets from malware as its input, producing YARA and Semgrep rules that can be directly deployed in software development. Specifically, the rule generation task involves three subtasks: crafting rules, refining rules, and aligning rules. To validate RuleLLM's effectiveness, we implemented a prototype system and conducted experiments on the dataset of 1,633 malicious packages. The results are promising that RuleLLM generated 763 rules (452 YARA and 311 Semgrep) with a precision of 85.2\% and a recall of 91.8\%, outperforming state-of-the-art (SOTA) tools and scored-based approaches. We further analyzed generated rules and proposed a rule taxonomy: 11 categories and 38 subcategories.

作者：XiangRui Zhang、HaoYu Chen、Yongzhong He、Wenjia Niu、Qiang Li

作者单位：

学科分类：计算技术、计算机技术自动化技术、自动化技术设备

推荐引用：XiangRui Zhang,HaoYu Chen,Yongzhong He,Wenjia Niu,Qiang Li.Automatically Generating Rules of Malicious Software Packages via Large Language Model[EB/OL].(2025-04-23)[2025-05-24].https://arxiv.org/abs/2504.17198.点此复制

Automatically Generating Rules of Malicious Software Packages via Large Language Model

Automatically Generating Rules of Malicious Software Packages via Large Language Model

评论