|国家预印本平台
首页|Large Language Model-Driven Concolic Execution for Highly Structured Test Input Generation

Large Language Model-Driven Concolic Execution for Highly Structured Test Input Generation

Large Language Model-Driven Concolic Execution for Highly Structured Test Input Generation

来源:Arxiv_logoArxiv
英文摘要

How can we perform concolic execution to generate highly structured test inputs for systematically testing parsing programs? Existing concolic execution engines are significantly restricted by (1) input structure-agnostic path constraint selection, leading to the waste of testing effort or missing coverage; (2) limited constraint-solving capability, yielding many syntactically invalid test inputs; (3) reliance on manual acquisition of highly structured seed inputs, resulting in non-continuous testing. This paper proposes Cottontail, a new Large Language Model (LLM)-driven concolic execution engine, to mitigate the above limitations. A more complete program path representation, named Expressive Structural Coverage Tree (ESCT), is first constructed to select structure-aware path constraints. Later, an LLM-driven constraint solver based on a Solve-Complete paradigm is designed to solve the path constraints smartly to get test inputs that are not only satisfiable to the constraints but also valid to the input syntax. Finally, a history-guided seed acquisition is employed to obtain new highly structured test inputs either before testing starts or after testing is saturated. We implemented Cottontail on top of SymCC and evaluated eight extensively tested open-source libraries across four different formats (XML, SQL, JavaScript, and JSON). The experimental result is promising: it shows that Cottontail outperforms state-of-the-art approaches (SymCC and Marco) by 14.15% and 14.31% in terms of line coverage. Besides, Cottontail found 6 previously unknown vulnerabilities (six new CVEs have been assigned). We have reported these issues to developers, and 4 out of them have been fixed so far.

Haoxin Tu、Seongmin Lee、Yuxian Li、Peng Chen、Lingxiao Jiang、Marcel B?hme

计算技术、计算机技术

Haoxin Tu,Seongmin Lee,Yuxian Li,Peng Chen,Lingxiao Jiang,Marcel B?hme.Large Language Model-Driven Concolic Execution for Highly Structured Test Input Generation[EB/OL].(2025-04-24)[2025-05-10].https://arxiv.org/abs/2504.17542.点此复制

评论