基于大语言模型的中英文整合复杂性建模研究
Integrative Complexity Modeling in English and Chinese Texts based on large language model
整合复杂性是心理学中用来测量个体思维结构的一个概念,主要涉及两个方面:区分性和整合性。区分性是指个体能够识别和理解信息中存在的不同观点或元素的能力;整合性是指个体能够将这些不同的观点或元素合并成一个有逻辑性和连贯性的整体的能力。整合复杂性的测量主要依靠人工对于文本内容进行分析,这些文本可以是书面材料、演讲稿、面试记录或任何其他形式的口头或书面表达。针对当前整合复杂性人工测评方法成本高、自动化评估方法精度低以及缺乏中文文本评估方案等问题,本研究基于大语言模型文本数据增强技术和模型迁移技术为整合复杂性的评估设计了对于中英文文本的自动化评估方案,并探索了整合复杂性两种子结构:精细整合复杂性和辩证整合复杂性的自动化评估方法。本文设计并实施了两个研究,首先基于大语言模型文本数据增强技术实现了对于英文文本整合复杂性的预测模型,其次基于模型迁移技术实现了对于中文文本整合复杂性的预测模型。研究结果显示:1)使用GPT-3.5-Tubo对于英文文本数据进行增强,使用预训练多语言Roberta模型进行词向量提取,使用文本卷积神经网络模型作为下游模型。与人工标注相比,整合复杂性Spearman相关系数为0.62,辩证整合复杂性相关系数为0.51,精细整合复杂性Spearman相关系数为0.60。优于机器学习方法以及未经过数据增强的神经网络模型。2)本文在研究二中建立了与研究一中的神经网络结构一致的模型,并将研究一中最终的模型参数迁移至本研究的模型中,对于中文文本整合复杂性进行训练。在零样本的情况下,迁移学习模型整合复杂性Spearman相关系数为0.31,辩证整合复杂性Spearman相关系数为0.31,精细整合复杂性相关系数为0.33,均优于随机参数情况下的模型表现(整合复杂性:0.17,辩证整合复杂性:0.10,精细整合复杂性:0.10)。在小样本情况下迁移学习模型整合复杂性Spearman相关系数为0.73,辩证整合复杂性Spearman相关系数为0.51,精细整合复杂性相关系数为0.73。
Integrative complexity is a concept used in psychology to measure the structure of an individuals thinking in two aspects: differentiation and integration. The measurement of integrative complexity relies primarily on manual analysis of textual content, which can be written materials, speeches, interview transcript large language models, or any other form of oral or written expression. To solve the problems of high cost of manual assessment methods, low accuracy of automated assessment methods, and the lack of Chinese text assessment scheme, this study designed an automated assessment scheme for integrative complexity on Chinese and English texts. We utilized text data enhancement technique of the large language model and the model migration technique for the assessment of integrative complexity, and explored the automated assessment methods for the two sub-structures of integrative complexity, namely, the fine integration complexity and the dialectical integration complexity. In this paper, two studies are designed and implemented. Firstly, a prediction model for the integration complexity of English text is implemented based on the text data enhancement technology of large language model; secondly, a prediction model for the integration complexity of Chinese text is implemented based on the model transfer technology. The results showed that: 1) We used GPT-3.5-Tubo for English text data enhancement, a pre-trained multilingual Roberta model for word vector extraction, and a text convolutional neural network model as a downstream model. The Spearman correlation coefficient between this models prediction of integration complexity and the manual scoring results was 0.62, with a dialectical integration complexity correlation coefficient of 0.51 and a fine integration complexity Spearman correlation coefficient of 0.60. It is superior to machine learning methods and neural network models without data enhancement. 2) In Study 2, a model with the same structure as the neural network in Study 1 was established, and the final model parameters in Study 1 were also transferred to the model in this study to train the integration complexity prediction model based on Chinese text. In the case of zero samples, the Spearman correlation coefficients of the transfer learning model for integrative complexity are 0.31, the Spearman correlation coefficient of dialectical integration complexity is 0.31, and the correlation coefficient of fine integration complexity is 0.33, all of which are better than the model in the case of random parameters (integrative complexity: 0.17, dialectical integrative complexity: 0.10, fine integrative complexity: 0.10). In the case of small samples, the Spearman correlation coefficient of the transfer learning model was 0.73, with a dialectical integration complexity correlation coefficient of 0.51 and a fine integration complexity correlation coefficient of 0.73.
科学、科学研究计算技术、计算机技术信息传播、知识传播
整合复杂性神经网络大语言模型迁移学习
Integrative ComplexityNeural NetworksLarge Language Modelsransfer Learning
.基于大语言模型的中英文整合复杂性建模研究[EB/OL].(2024-04-10)[2025-08-02].https://chinaxiv.org/abs/202404.00195.点此复制
评论