首页|KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation

KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation

来源：

英文摘要

We propose a novel k-step return estimation method (called KETCHUP) for Reinforcement Learning(RL)-based knowledge distillation (KD) in text generation tasks. Our idea is to induce a K-step return by using the Bellman Optimality Equation for multiple steps. Theoretical analysis shows that this K-step formulation reduces the variance of the gradient estimates, thus leading to improved RL optimization especially when the student model size is large. Empirical evaluation on three text generation tasks demonstrates that our approach yields superior performance in both standard task metrics and large language model (LLM)-based evaluation. These results suggest that our K-step return induction offers a promising direction for enhancing RL-based KD in LLM research.

作者：Jiabin Fan、Guoqing Luo、Michael Bowling、Lili Mou

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Jiabin Fan,Guoqing Luo,Michael Bowling,Lili Mou.KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation[EB/OL].(2025-04-26)[2025-05-28].https://arxiv.org/abs/2504.19024.点此复制

KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation

KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation

评论