|国家预印本平台
首页|认知诊断评估中缺失数据的处理:随机森林阈值插补法

认知诊断评估中缺失数据的处理:随机森林阈值插补法

中文摘要英文摘要

认知诊断评估中缺失数据的处理是理论和实际应用者非常关注的研究主题。借鉴随机森林插补法(RFI)不依赖于缺失机制假设的特点,对已有的RFI方法进行改进,提出采用个人拟合指标(RCI)确定插补阈值的新方法:随机森林阈值插补方法(RFTI)。模拟研究表明,RFTI在插补正确率上明显高于RFI方法;与RFI和EM方法相比,RFTI在被试属性模式判准率和边际判准率上表现出明显优势,尤其是非随机缺失和混合缺失机制,以及缺失比例较高的条件下,其优势更加明显。但对项目参数的估计,RFTI方法不具有优势。

s a new form of test, cognitive diagnostic assessment has attracted wide attention from researchers at home and abroad. At the same time, missing data caused by characteristics of the test design is a rather common issue encountered in cognitive diagnostic tests. It is therefore very important to develop an effective solution for dealing with missing data in cognitive diagnostic assessment ensuring that diagnosis feedback provided to both students and teachers is more accurate and reliable. As a matter of fact, machine learning has been applied to impute missing data in recent years. As one of the machine learning algorithms, the random forest has been proved to be a state-of-the-art learner because it exhibits good performance when handling classification and regression tasks with effectiveness and efficiency, and is capable of solving multi-class classification problems in an efficient manner. Interestingly, this algorithm has a distinct advantage in terms of coping with noise interference. Furthermore, the random forest imputation method, an improved algorithm for dealing with missing data based on the random forest algorithm, makes full use of the available response information and characteristics of response patterns of participants to impute missing data instead of assuming the mechanism of missingness in advance. By combining the advantages of the random forest method in classification and prediction and the assumption-free feature of the random forest imputation method, we attempt to improve the existing random forest imputation algorithm so that the method can be properly applied to handle missing data in cognitive diagnostic assessment. On the basis of the DINA (Deterministic Inputs, Noise "And" Gate) model, widely used in cognitive diagnostic assessment, we introduce the RCI (Response Conformity Index) into missing data imputation to identify threshold of imputation type and hence proposes a new method for handling missing responses in the DINA model: random forest threshold imputation (RFTI) approach. Two simulation studies have been conducted in order to validate the effectiveness of RFTI. In addition, the advantages of the new method have been explored by comparing it with traditional techniques for handling missing data. First, the theoretical basis and algorithm implementation of RFTI were described in detail. Then, two Monte Carlo simulations were employed to validate the effectiveness of RFTI in terms of imputation rate and accuracy as well as the accuracy in DINA model parameter estimation. Moreover, the applicability of RFTI was investigated by considering different mechanisms for missingness (MNAR, MIXED, MAR and MCAR) and different proportions of missing values (10%, 20% 30%, 40% and 50%). The main results indicated: (1) imputation accuracy of RFT was significantly higher than that of the random forest imputation (RFTI) methods, and the data missingness rate treated by RFTI was about 10% under all conditions; (2) the highest attribute pattern match ratio and attribute marginal match ratio of participants were observed using RFTI under all conditions as compared to that of EM algorithm and RFI. Moreover, this behavior depended on the proportion and mechanisms of missing data. Results indicated that this phenomenon became more obvious when the missingness mechanism was MNAR and MIXED and the proportion of missing responses were more than 30%. However, the new algorithm failed to show superiority in estimating DINA model parameter. Based on these results, we conclude the article with an overall summary and recommendations, as well as the further direction.

杨建芹、游晓锋、秦春影、刘红云

10.12074/202303.08264V1

教育科学、科学研究计算技术、计算机技术

缺失数据认知诊断评估随机森林阈值插补随机森林插补EM算法

杨建芹,游晓锋,秦春影,刘红云.认知诊断评估中缺失数据的处理:随机森林阈值插补法[EB/OL].(2023-03-16)[2025-08-02].https://chinaxiv.org/abs/202303.08264.点此复制

评论