|国家预印本平台
首页|基于大语言模型的自杀意念文本数据增强与识别技术研究

基于大语言模型的自杀意念文本数据增强与识别技术研究

英文摘要

Suicide constitutes a significant global public health challenge, with the World Health Organization reporting substantial annual mortality rates. Traditional suicide detection methods primarily depend on self-assessment scales and clinical evaluations, which require considerable resources and rely on patients actively seeking assistance. The integrated motivational-volitional (IMV) model offers a theoretical framework for comprehending suicidal behavior progression, with suicidal ideation serving as a critical risk indicator. While text-based analysis presents a promising non-invasive approach for early identification, it encounters technical challenges due to limited annotated data and linguistic complexity. Large Language Models (LLMs) offer unprecedented capabilities in language understanding and generation, potentially addressing these challenges through their ability to comprehend diverse expressions of suicidal ideation and generate high-quality training data.This research employed a two-stage design leveraging LLMs to address the challenge of limited training data for suicidal ideation recognition. In Study I, we selected ChatGLM3-6B and Qwen-7B-Chat as foundation LLMs and implemented both zero-shot and few-shot learning approaches combined with supervised learning strategies. We extracted examples from an original dataset of Weibo comments to create high-quality training data for the LLMs. Comparative experiments evaluated model performance, with human coders assessing the quality of LLM-generated texts using established suicide risk evaluation criteria. In Study II, we evaluated the impact of LLM-based data augmentation on recognition models by comparing traditional machine learning approaches with LLM-based methods trained on both original and augmented datasets, measuring performance through accuracy and true negative rate metrics.In Study I, the two self-developed LLM-based models demonstrated excellent performance in suicidal ideation data augmentation, significantly outperforming baseline models according to comprehensive evaluation metrics. The success of these LLM-enhanced models highlighted the effectiveness of high-quality data construction through advanced language modeling capabilities. In Study II, all experimental models trained on LLM-augmented data significantly outperformed their corresponding baseline models in both accuracy and true negative rate. The highest-performing model utilized the ChatGLM3-6B architecture with few-shot learning, showing marked improvements compared to its baseline counterpart. These findings demonstrate the substantial impact of LLM-based data augmentation on model generalization ability, particularly in capturing diverse and subtle expressions of suicidal ideation that traditional approaches often miss.This study validates the effectiveness of LLM-based data augmentation methods in enhancing suicidal ideation recognition while addressing data scarcity challenges. The non-invasive approach developed through LLM technology has the potential to provide timely and effective early warning of suicide risk while protecting user privacy. This research contributes to both theoretical understanding of LLMs capabilities in complex psychological text processing and practical applications in mental health monitoring. Future research should explore cross-platform applicability of LLMs, model interpretability, and ethical considerations to further advance this promising technology in suicide prevention and broader mental health applications.

章彦博、黄峰、莫柳铃、刘晓倩、朱廷劭

中国科学院大学心理学系;中国科学院心理研究所行为科学重点实验室中国科学院大学心理学系;中国科学院心理研究所行为科学重点实验室;香港城市大学计算学院数据科学系南开大学社会学院社会心理学系中国科学院大学心理学系;中国科学院心理研究所行为科学重点实验室中国科学院大学心理学系;中国科学院心理研究所行为科学重点实验室

计算技术、计算机技术

自杀意念数据增强自杀文本识别大语言模型人工智能

Suicidal IdeationData AugmentationSuicide Text RecognitionLarge Language ModelsArtificial Intelligence

章彦博,黄峰,莫柳铃,刘晓倩,朱廷劭.基于大语言模型的自杀意念文本数据增强与识别技术研究[EB/OL].(2025-03-19)[2025-05-09].https://chinaxiv.org/abs/202503.00213.点此复制

评论