|国家预印本平台
首页|Post-Completion Learning for Language Models

Post-Completion Learning for Language Models

Post-Completion Learning for Language Models

来源:Arxiv_logoArxiv
英文摘要

Current language model training paradigms typically terminate learning upon reaching the end-of-sequence (<eos>) token, overlooking the potential learning opportunities in the post-completion space. We propose Post-Completion Learning (PCL), a novel training framework that systematically utilizes the sequence space after model output completion, to enhance both the reasoning and self-evaluation abilities. PCL enables models to continue generating self-assessments and reward predictions during training, while maintaining efficient inference by stopping at the completion point. To fully utilize this post-completion space, we design a white-box reinforcement learning method: let the model evaluate the output content according to the reward rules, then calculate and align the score with the reward functions for supervision. We implement dual-track SFT to optimize both reasoning and evaluation capabilities, and mixed it with RL training to achieve multi-objective hybrid optimization. Experimental results on different datasets and models demonstrate consistent improvements over traditional SFT and RL methods. Our method provides a new technical path for language model training that enhances output quality while preserving deployment efficiency.

Xiang Fei、Siqi Wang、Shu Wei、Yuxiang Nie、Wei Shi、Hao Feng、Chao Feng、Can Huang

计算技术、计算机技术

Xiang Fei,Siqi Wang,Shu Wei,Yuxiang Nie,Wei Shi,Hao Feng,Chao Feng,Can Huang.Post-Completion Learning for Language Models[EB/OL].(2025-08-05)[2025-08-10].https://arxiv.org/abs/2507.20252.点此复制

评论