|国家预印本平台
| 注册
首页|Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Chi-Pin Huang Yunze Man Zhiding Yu Min-Hung Chen Jan Kautz Yu-Chiang Frank Wang Fu-En Yang

Arxiv_logoArxiv

Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning

Chi-Pin Huang Yunze Man Zhiding Yu Min-Hung Chen Jan Kautz Yu-Chiang Frank Wang Fu-En Yang

作者信息

Abstract

Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on reasoning VLAs show that explicit chain-of-thought (CoT) can improve generalization, they suffer from high inference latency due to lengthy reasoning traces. We propose Fast-ThinkAct, an efficient reasoning framework that achieves compact yet performant planning through verbalizable latent reasoning. Fast-ThinkAct learns to reason efficiently with latent CoTs by distilling from a teacher, driven by a preference-guided objective to align manipulation trajectories that transfers both linguistic and visual planning capabilities for embodied control. This enables reasoning-enhanced policy learning that effectively connects compact reasoning to action execution. Extensive experiments across diverse embodied manipulation and reasoning benchmarks demonstrate that Fast-ThinkAct achieves strong performance with up to 89.3% reduced inference latency over state-of-the-art reasoning VLAs, while maintaining effective long-horizon planning, few-shot adaptation, and failure recovery.

引用本文复制引用

Chi-Pin Huang,Yunze Man,Zhiding Yu,Min-Hung Chen,Jan Kautz,Yu-Chiang Frank Wang,Fu-En Yang.Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning[EB/OL].(2026-02-24)[2026-04-03].https://arxiv.org/abs/2601.09708.

学科分类

计算技术、计算机技术

评论

首发时间 2026-02-24
下载量:0
|
点击量:20
段落导航相关论文