首页|SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models

SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models

来源：

英文摘要

The rapid advancement of multi-modal large reasoning models (MLRMs) -- enhanced versions of multimodal language models (MLLMs) equipped with reasoning capabilities -- has revolutionized diverse applications. However, their safety implications remain underexplored. While prior work has exposed critical vulnerabilities in unimodal reasoning models, MLRMs introduce distinct risks from cross-modal reasoning pathways. This work presents the first systematic safety analysis of MLRMs through large-scale empirical studies comparing MLRMs with their base MLLMs. Our experiments reveal three critical findings: (1) The Reasoning Tax: Acquiring reasoning capabilities catastrophically degrades inherited safety alignment. MLRMs exhibit 37.44% higher jailbreaking success rates than base MLLMs under adversarial attacks. (2) Safety Blind Spots: While safety degradation is pervasive, certain scenarios (e.g., Illegal Activity) suffer 25 times higher attack rates -- far exceeding the average 3.4 times increase, revealing scenario-specific vulnerabilities with alarming cross-model and datasets consistency. (3) Emergent Self-Correction: Despite tight reasoning-answer safety coupling, MLRMs demonstrate nascent self-correction -- 16.9% of jailbroken reasoning steps are overridden by safe answers, hinting at intrinsic safeguards. These findings underscore the urgency of scenario-aware safety auditing and mechanisms to amplify MLRMs' self-correction potential. To catalyze research, we open-source OpenSafeMLRM, the first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods. Our work calls for immediate efforts to harden reasoning-augmented AI, ensuring its transformative potential aligns with ethical safeguards.

作者：Junfeng Fang、Yukai Wang、Ruipeng Wang、Zijun Yao、Kun Wang、An Zhang、Xiang Wang、Tat-Seng Chua

作者单位：

学科分类：安全科学

推荐引用：Junfeng Fang,Yukai Wang,Ruipeng Wang,Zijun Yao,Kun Wang,An Zhang,Xiang Wang,Tat-Seng Chua.SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models[EB/OL].(2025-04-09)[2025-06-27].https://arxiv.org/abs/2504.08813.点此复制

SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models

SafeMLRM: Demystifying Safety in Multi-modal Large Reasoning Models

评论