首页|基于大语言模型的叙事文本焦虑预测：不同模型、提示词风格的比较

基于大语言模型的叙事文本焦虑预测：不同模型、提示词风格的比较

翟予童王文谦任程罗敏朱廷劭

来源：

中国科学院科技论文预发布平台

基于大语言模型的叙事文本焦虑预测：不同模型、提示词风格的比较

Anxiety Prediction in Narrative Texts Based on Large Language Models: A comparison of Model Selection and Prompting Styles

翟予童 ¹王文谦 ¹任程 ¹罗敏 ¹朱廷劭¹

作者信息

1. 中国科学院大学心理学系北京 101408;中国科学院心理研究所北京 100101
折叠

摘要

焦虑主要包括主观焦虑体验、认知警觉、生理和躯体症状等方面。随着社交媒体发展，从个体文本数据识别焦虑辅助临床治疗具有广泛应用前景。过往多采用传统机器学习方法来识别文本中焦虑，但准确率均不佳。近年来发展起来的大语言模型为文本焦虑识别提供了新的可能路径，本研究探讨了不同模型以及不同提示词风格下，大语言模型识别自我报告叙事文本中焦虑的效果。研究中，四位专家对50个文本预打分，评分者一致性良好，之后对最终252个文本进行专家评分。在四种提示词策略（zero-shot、few-shot、chain of thought、few-shot + chain of thought）引导下采用Qwen-max和Deepseek-reasoner来进行大模型评分。结果发现Qwen-max与专家一致性达到0.8左右，显著优于deepseek-reasoner（0.6-0.7）。有示例且有思维链(few-shot + chain of thought)的提示词评估效果最优，无示例也无思维链(zero-shot)提示词效果最不佳，Qwen-max相比Deepseek-reasoner对于不同提示策略更不敏感。另外，对于Deepssek-reasoner，提示词中的示例（相比思维链）对其评估提升更有帮助。本研究为未来文本中焦虑的评估筛查提供初步技术方案。

Abstract

Anxiety mainly involves subjective anxious experiences, cognitive vigilance, and somatic symptoms. With the development of social media, identifying anxiety from textual data to assist clinical intervention has broad application prospects. Previous studies have predominantly relied on traditional machine learning approaches to detect anxiety in text, with limited accuracy. The development of large language models (LLMs) has provided a novel and promising pathway for textual anxiety identification. The present study examined the performance of LLMs in identifying anxiety in self-reported narrative texts under different model types and prompt styles. Four experts first independently pre-rated 50 texts, and satisfactory inter-rater reliability was achieved. Subsequently, expert ratings were conducted for a final set of 252 texts. Under the guidance of four prompt strategies (zero-shot, few-shot, chain of thought, and few-shot + chain of thought), Qwen-max and Deepseek-reasoner were employed to generate model-based ratings. The results indicated that Qwen-max achieved an agreement of approximately 0.8 with expert ratings, significantly outperforming Deepseek-reasoner (0.60.7). Few-shot + chain of thought achieved the best evaluation performance, whereas zero-shot performed the worst. Qwen-max was less sensitive to variations in prompt strategies than Deepseek-reasoner. In addition, for Deepseek-reasoner, the inclusion of examples in prompts (few-shot), compared with chain-of-thought, contributed more to performance improvement. This study provides a preliminary technical framework for future assessment and screening of anxiety in textual data.

关键词

焦虑/大语言模型/提示词风格/自我报告叙事文本

Key words

Anxiety/Large language models/Prompt style/Self-reported narrative texts

引用本文复制引用

翟予童,王文谦,任程,罗敏,朱廷劭.基于大语言模型的叙事文本焦虑预测：不同模型、提示词风格的比较[EB/OL].(2026-03-09)[2026-03-11].https://chinaxiv.org/abs/202603.00044.

学科分类

计算技术、计算机技术

首发时间： 2026-03-09

下载量：0

点击量：16

段落导航

基于大语言模型的叙事文本焦虑预测：不同模型、提示词风格的比较

基于大语言模型的叙事文本焦虑预测：不同模型、提示词风格的比较

Anxiety Prediction in Narrative Texts Based on Large Language Models: A comparison of Model Selection and Prompting Styles

摘要

Abstract

关键词

Key words

引用本文复制引用

学科分类

评论