|国家预印本平台
首页|Distilling Realizable Students from Unrealizable Teachers

Distilling Realizable Students from Unrealizable Teachers

Distilling Realizable Students from Unrealizable Teachers

来源:Arxiv_logoArxiv
英文摘要

We study policy distillation under privileged information, where a student policy with only partial observations must learn from a teacher with full-state access. A key challenge is information asymmetry: the student cannot directly access the teacher's state space, leading to distributional shifts and policy degradation. Existing approaches either modify the teacher to produce realizable but sub-optimal demonstrations or rely on the student to explore missing information independently, both of which are inefficient. Our key insight is that the student should strategically interact with the teacher --querying only when necessary and resetting from recovery states --to stay on a recoverable path within its own observation space. We introduce two methods: (i) an imitation learning approach that adaptively determines when the student should query the teacher for corrections, and (ii) a reinforcement learning approach that selects where to initialize training for efficient exploration. We validate our methods in both simulated and real-world robotic tasks, demonstrating significant improvements over standard teacher-student baselines in training efficiency and final performance. The project website is available at : https://portal-cornell.github.io/CritiQ_ReTRy/

Yujin Kim、Nathaniel Chin、Arnav Vasudev、Sanjiban Choudhury

计算技术、计算机技术

Yujin Kim,Nathaniel Chin,Arnav Vasudev,Sanjiban Choudhury.Distilling Realizable Students from Unrealizable Teachers[EB/OL].(2025-05-14)[2025-06-12].https://arxiv.org/abs/2505.09546.点此复制

评论