|国家预印本平台
首页|Fast-Slow Thinking for Large Vision-Language Model Reasoning

Fast-Slow Thinking for Large Vision-Language Model Reasoning

Fast-Slow Thinking for Large Vision-Language Model Reasoning

来源:Arxiv_logoArxiv
英文摘要

Recent advances in large vision-language models (LVLMs) have revealed an \textit{overthinking} phenomenon, where models generate verbose reasoning across all tasks regardless of questions. To address this issue, we present \textbf{FAST}, a novel \textbf{Fa}st-\textbf{S}low \textbf{T}hinking framework that dynamically adapts reasoning depth based on question characteristics. Through empirical analysis, we establish the feasibility of fast-slow thinking in LVLMs by investigating how response length and data distribution affect performance. We develop FAST-GRPO with three components: model-based metrics for question characterization, an adaptive thinking reward mechanism, and difficulty-aware KL regularization. Experiments across seven reasoning benchmarks demonstrate that FAST achieves state-of-the-art accuracy with over 10\% relative improvement compared to the base model, while reducing token usage by 32.7-67.3\% compared to previous slow-thinking approaches, effectively balancing reasoning length and accuracy.

Wenyi Xiao、Leilei Gan、Weilong Dai、Wanggui He、Ziwei Huang、Haoyuan Li、Fangxun Shu、Zhelun Yu、Peng Zhang、Hao Jiang、Fei Wu

计算技术、计算机技术

Wenyi Xiao,Leilei Gan,Weilong Dai,Wanggui He,Ziwei Huang,Haoyuan Li,Fangxun Shu,Zhelun Yu,Peng Zhang,Hao Jiang,Fei Wu.Fast-Slow Thinking for Large Vision-Language Model Reasoning[EB/OL].(2025-04-25)[2025-07-09].https://arxiv.org/abs/2504.18458.点此复制

评论