|国家预印本平台
首页|A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search

A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search

A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search

来源:Arxiv_logoArxiv
英文摘要

Large language models (LLMs) demand considerable computational, energy, and financial resources during both training and deployment. While scaling laws for training have guided much of the field's recent progress, inference costs now represent a significant and growing component of the overall resource burden, particularly for reasoning-focused models. Existing characterizations of compute-optimality that consider model size, dataset size, and inference tokens in isolation or in fixed combinations risk overlooking more efficient operating points. We introduce directed stochastic skill search (DS3), a general framework that represents inference as stochastic traversal over a learned skill graph. From a simplified yet expressive instantiation, we derive closed-form expressions for task success and compute cost across a wide range of inference strategies -- including chain-of-thought (CoT) and tree-of-thought (ToT) -- enabling comparative analysis as a function of task difficulty and model capability. To that end, we extend a prior first-principles tripartite graph framework of LLM training to incorporate inference, and separately bridge DS3 with empirical methods that characterize LLM scaling behavior. We theoretically recover empirically observed patterns, including: linear accuracy scaling with logarithmic compute; variation in preferred inference strategies as a function of task difficulty and model capability; emergent behavior elicited by reasoning even when performance plateaus under parameter scaling; and both best-of-N (BoN) and majority voting behavior captured within a unified analytical framework. By explicitly characterizing training-inference interdependencies, our framework deepens theoretical understanding and supports principled algorithmic design and resource allocation.

Austin R. Ellis-Mohr、Anuj K. Nayak、Lav R. Varshney

计算技术、计算机技术

Austin R. Ellis-Mohr,Anuj K. Nayak,Lav R. Varshney.A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search[EB/OL].(2025-07-10)[2025-07-20].https://arxiv.org/abs/2507.00004.点此复制

评论