Breaking the Myth: Can Small Models Infer Postconditions Too?
Breaking the Myth: Can Small Models Infer Postconditions Too?
Formal specifications are essential for ensuring software correctness, yet manually writing them is tedious and error-prone. Large Language Models (LLMs) have shown promise in generating such specifications from natural language intents, but the giant model size and high computational demands raise a fundamental question: Do we really need large models for this task? In this paper, we show that a small, fine-tuned language model can achieve high-quality postcondition generation with much lower computational costs. We construct a specialized dataset of prompts, reasoning logs, and postconditions, then supervise the fine-tuning of a $7$B-parameter code model. Our approach tackles real-world repository dependencies and preserves pre-state information, allowing for expressive and accurate specifications. We evaluate the model on a benchmark of real-world Java bugs (Defects4J) and compare against both proprietary giants (e.g., GPT-4o) and open-source large models. Empirical results demonstrate that our compact model matches or outperforms significantly larger counterparts in syntax correctness, semantic correctness, and bug-distinguishing capability. These findings highlight that targeted fine-tuning on a modest dataset can enable small models to achieve results formerly seen only in massive, resource-heavy LLMs, offering a practical and efficient path for the real-world adoption of automated specification generation.
Gehao Zhang、Zhenting Wang、Juan Zhai
计算技术、计算机技术
Gehao Zhang,Zhenting Wang,Juan Zhai.Breaking the Myth: Can Small Models Infer Postconditions Too?[EB/OL].(2025-07-14)[2025-07-23].https://arxiv.org/abs/2507.10182.点此复制
评论