|国家预印本平台
首页|Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation

来源:Arxiv_logoArxiv
英文摘要

Extensive LLM applications demand efficient structured generations, particularly for LR(1) grammars, to produce outputs in specified formats (e.g., JSON). Existing methods primarily parse LR(1) grammars into a pushdown automaton (PDA), leading to runtime execution overhead for context-dependent token processing, especially inefficient under large inference batches. To address these issues, we propose Pre$^3$ that exploits deterministic pushdown automata (DPDA) to optimize the constrained LLM decoding efficiency. First, by precomputing prefix-conditioned edges during the preprocessing, Pre$^3$ enables ahead-of-time edge analysis and thus makes parallel transition processing possible. Second, by leveraging the prefix-conditioned edges, Pre$^3$ introduces a novel approach that transforms LR(1) transition graphs into DPDA, eliminating the need for runtime path exploration and achieving edge transitions with minimal overhead. Pre$^3$ can be seamlessly integrated into standard LLM inference frameworks, reducing time per output token (TPOT) by up to 40% and increasing throughput by up to 36% in our experiments. Our code is available at https://github.com/ModelTC/lightllm.

Junyi Chen、Shihao Bai、Zaijun Wang、Siyu Wu、Chuheng Du、Hailong Yang、Ruihao Gong、Shengzhong Liu、Fan Wu、Guihai Chen

计算技术、计算机技术

Junyi Chen,Shihao Bai,Zaijun Wang,Siyu Wu,Chuheng Du,Hailong Yang,Ruihao Gong,Shengzhong Liu,Fan Wu,Guihai Chen.Pre$^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation[EB/OL].(2025-06-04)[2025-06-17].https://arxiv.org/abs/2506.03887.点此复制

评论