Decoupling Understanding from Reasoning via Problem Space Mapping for Small-scale Model Reasoning
Decoupling Understanding from Reasoning via Problem Space Mapping for Small-scale Model Reasoning
Despite recent advances in the reasoning capabilities of Large Language Models (LLMs), improving the reasoning ability of Small Language Models (SLMs, e.g., $\leq$ 1.5B) remains challenging. A key obstacle lies in the complexity and variability of natural language: essentially equivalent problems often appear in diverse surface forms, often obscured by redundant or distracting details. This imposes a dual burden on SLMs: they must first extract the core problem from complex linguistic input, and then perform reasoning based on that understanding. The resulting vast and noisy problem space hinders optimization, particularly for models with limited capacity. To address this, we propose a new framework that decouples understanding from reasoning by mapping natural language problems into a canonical problem space-a semantically simplified yet expressive domain. This enables SLMs to focus on reasoning over standardized inputs, free from linguistic variability. Within this framework, we introduce DURIT (Decoupled Understanding from Reasoning via Iterative Training), a three-step algorithm that iteratively: (1) mapping natural language problems via reinforcement learning, (2) aligns reasoning trajectories through self-distillation, and (3) trains reasoning policies in the problem space. The mapper and reasoner are co-trained in an alternating loop throughout this process. Experiments show that DURIT substantially improves SLMs' performance on both in-domain and out-of-domain mathematical and logical reasoning tasks. Beyond improving reasoning capabilities, DURIT also improves the robustness of reasoning, validating decoupling understanding from reasoning as an effective strategy for strengthening SLMs.
Li Wang、Changhao Zhang、Zengqi Xiu、Kai Lu、Xin Yu、Kui Zhang、Wenjun Wu
计算技术、计算机技术
Li Wang,Changhao Zhang,Zengqi Xiu,Kai Lu,Xin Yu,Kui Zhang,Wenjun Wu.Decoupling Understanding from Reasoning via Problem Space Mapping for Small-scale Model Reasoning[EB/OL].(2025-08-07)[2025-08-24].https://arxiv.org/abs/2508.10019.点此复制
评论