|国家预印本平台
首页|Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

来源:Arxiv_logoArxiv
英文摘要

Recent advances in multi-modal large language models (MLLMs) have significantly improved object-level grounding and region captioning. However, they remain limited in visual relation understanding, struggling even with binary relation detection, let alone \textit{N}-ary relations involving multiple semantic roles. The core reason is the lack of modeling for \textit{structural semantic dependencies} among multi-entities, leading to unreliable outputs, hallucinations, and over-reliance on language priors (\eg, defaulting to ``person drinks a milk'' if a person is merely holding it). To this end, we propose Relation-R1, the \textit{first unified} relation comprehension framework that explicitly integrates cognitive chain-of-thought (CoT)-guided supervised fine-tuning (SFT) and group relative policy optimization (GRPO) within a reinforcement learning (RL) paradigm. Specifically, we first establish foundational reasoning capabilities via SFT, enforcing structured outputs with thinking processes. Then, GRPO is utilized to refine these outputs via multi-rewards optimization, prioritizing visual-semantic grounding over language-induced biases, thereby improving generalization capability. Furthermore, we investigate the impact of various CoT strategies within this framework, demonstrating that a specific-to-general progressive approach in CoT guidance further improves generalization, especially in capturing synonymous \textit{N}-ary relations. Extensive experiments on widely-used PSG and SWiG datasets demonstrate that Relation-R1 achieves state-of-the-art performance in both binary and \textit{N}-ary relation understanding.

Jiahui Li、Kwang-Ting Cheng、Long Chen、Lin Li、Wei Chen

计算技术、计算机技术

Jiahui Li,Kwang-Ting Cheng,Long Chen,Lin Li,Wei Chen.Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension[EB/OL].(2025-04-20)[2025-06-05].https://arxiv.org/abs/2504.14642.点此复制

评论