|国家预印本平台
首页|Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?

Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?

Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?

来源:Arxiv_logoArxiv
英文摘要

Jailbreak attacks have been observed to largely fail against recent reasoning models enhanced by Chain-of-Thought (CoT) reasoning. However, the underlying mechanism remains underexplored, and relying solely on reasoning capacity may raise security concerns. In this paper, we try to answer the question: Does CoT reasoning really reduce harmfulness from jailbreaking? Through rigorous theoretical analysis, we demonstrate that CoT reasoning has dual effects on jailbreaking harmfulness. Based on the theoretical insights, we propose a novel jailbreak method, FicDetail, whose practical performance validates our theoretical findings.

Chengda Lu、Xiaoyu Fan、Yu Huang、Rongwu Xu、Jijie Li、Wei Xu

计算技术、计算机技术

Chengda Lu,Xiaoyu Fan,Yu Huang,Rongwu Xu,Jijie Li,Wei Xu.Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?[EB/OL].(2025-05-23)[2025-07-16].https://arxiv.org/abs/2505.17650.点此复制

评论