|国家预印本平台
首页|PuzzleClone: An SMT-Powered Framework for Synthesizing Verifiable Data

PuzzleClone: An SMT-Powered Framework for Synthesizing Verifiable Data

PuzzleClone: An SMT-Powered Framework for Synthesizing Verifiable Data

来源:Arxiv_logoArxiv
英文摘要

High-quality mathematical and logical datasets with verifiable answers are essential for strengthening the reasoning capabilities of large language models (LLMs). While recent data augmentation techniques have facilitated the creation of large-scale benchmarks, existing LLM-generated datasets often suffer from limited reliability, diversity, and scalability. To address these challenges, we introduce PuzzleClone, a formal framework for synthesizing verifiable data at scale using Satisfiability Modulo Theories (SMT). Our approach features three key innovations: (1) encoding seed puzzles into structured logical specifications, (2) generating scalable variants through systematic variable and constraint randomization, and (3) ensuring validity via a reproduction mechanism. Applying PuzzleClone, we construct a curated benchmark comprising over 83K diverse and programmatically validated puzzles. The generated puzzles span a wide spectrum of difficulty and formats, posing significant challenges to current state-of-the-art models. We conduct post training (SFT and RL) on PuzzleClone datasets. Experimental results show that training on PuzzleClone yields substantial improvements not only on PuzzleClone testset but also on logic and mathematical benchmarks. Post training raises PuzzleClone average from 14.4 to 56.2 and delivers consistent improvements across 7 logic and mathematical benchmarks up to 12.5 absolute percentage points (AMC2023 from 52.5 to 65.0). Our code and data are available at https://github.com/HiThink-Research/PuzzleClone.

Kai Xiong、Yanwei Huang、Rongjunchen Zhang、Kun Chen、Haipang Wu

数学计算技术、计算机技术

Kai Xiong,Yanwei Huang,Rongjunchen Zhang,Kun Chen,Haipang Wu.PuzzleClone: An SMT-Powered Framework for Synthesizing Verifiable Data[EB/OL].(2025-08-25)[2025-09-02].https://arxiv.org/abs/2508.15180.点此复制

评论