联合作答时间的等级得分模型开发及其应用
evelopment and Application of Graded Response Model Incorporating Response Times
国内外融合作答时间的测量模型研究多以0-1计分为基础。然而,在实际测验情境下(如数学测验中的多选题、计算题和应用题等),常采用等级计分方式。本文基于层次模型框架,在等级得分模型(GRM)上融入作答时间信息,构建联合作答时间的等级得分模型GRM-RT。参考已有实证研究,针对性地设置研究条件,重点考察在不同被试规模与测验长度条件下模型参数的估计返真性。并进一步将新模型应用于实证数据,一方面展示新模型的使用,另一方面进行不同模型的相对拟合比较。结果表明:在各实验条件下,GRM-RT模型的参数返真性较好且较为稳定;实证数据分析的结果进一步表明模型的实际应用价值。
With the extensive implementation of computer-based assessments, educators are now able to gather process data in addition to traditional observed responses. Among these process data, response time is of particular significance as it reflects examinees responding speed, cognitive processes, and degree of engagement with the test items. As information technology keeps advancing, assessment results are no longer confined to conventional item scores. By considering response time, researchers and practitioners can obtain a dynamic and multidimensional view of how examinees behave and interact with the assessment, thus deepening our understanding of test-taking processes. Previous research on Item Response Theory (IRT) models incorporating response time has mainly focused on dichotomous (0 and 1) scoring both domestically and internationally. However, in practical testing situations, such as multiple-choice items, constructed-response tasks, and essays, polytomous (graded) scoring is commonly used. To fill this research gap, the present study expands the Graded Response Model (GRM) by integrating response-time information. Consequently, the GRM - RT model for polytomous data is proposed, and the parameters are estimated using the Hamiltonian Monte Carlo (HMC) method.?????? To verify the accuracy and robustness of the GRM-RT model, a simulation study was carried out to examine the precision of parameter estimates. The simulation adopted a 223 factorial experimental design, with the independent variables being examinee ability distribution, sample size, and test length. To evaluate the estimation accuracy, evaluation metrics like Root Mean Square Error (RMSE) and Bias were employed. Regarding the convergence of parameter estimates, it was assessed using trace plots of the Markov chains and the potential scale reduction factor (R ). The results showed that the estimates of examinee parameters were satisfactory, consistently achieving desirable precision levels even under the conditions of small sample sizes and short test lengths. This indicates that the estimation method remains reliable and stable even when resources are limited. Item parameter estimation also showed high precision, with minimal variation across different conditions under normal distribution, mainly affected by random error. Under skewed distribution conditions, the precision of parameter estimates remained high, suggesting that the model is robust and reliable across various distribution scenarios.In the empirical study, a mathematics literacy scale in the field of education was analyzed. Model fit was evaluated by means of the Widely Applicable Information Criterion (WAIC) and Leave-One-Out cross-validation (LOO). The results demonstrated that the GRM-RT model had a better model fit than the RTs-mGPCM model. Additionally, the estimates of the scales item parameters revealed that the model had lower measurement error, further highlighting the excellent psychometric performance of the GRM-RT model and its promising applications in the educational field.The above findings suggest that the GRM-RT model has great application potential and can serve as a powerful tool for data analysis in various fields of education and psychology.
罗照盛、彭亚风、刘志城、秦春影、喻晓锋
教育
多级记分GRM模型作答时间建模
Polytomous ScoringGraded Response ModelResponse TimesModeling
罗照盛,彭亚风,刘志城,秦春影,喻晓锋.联合作答时间的等级得分模型开发及其应用[EB/OL].(2025-01-14)[2025-08-23].https://chinaxiv.org/abs/202501.00169.点此复制
评论