|国家预印本平台
首页|认知任务的测量信度:进展与前景

认知任务的测量信度:进展与前景

英文摘要

ognitive tasks are fundamental tools in experimental psychology and cognitive neuroscience, extensively used to probe cognitive mechanisms and assess dysfunctions across diverse domains. Despite their ability to produce robust group-level effects, recent studies have raised concerns about their low reliability in capturing individual differences. The seemly discrepancy between robust group-level effects and poor individual-level reliability, known as the "reliability paradox," highlights a critical challenge in the application of cognitive tasks for individual-level inference. The paradox is particularly consequential given the increasing use of cognitive tasks in real-life settings such as clinical diagnostics and personalized intervention. However, existing discussions on this issue remain fragmented and lack a comprehensive framework for understanding its causes and identifying viable solutions.We summarize the issues surrounding the reliability paradox of cognitive tasks and categorize them into two core challenges. The first pertains to the hierarchical data structure intrinsic to cognitive tasks, where data are nested within trials, blocks, and subjects. The second concerns construct validity: most tasks are developed to test the effectiveness of experimental manipulations rather than to measure well-defined cognitive constructsthose typically of primary interest in individual differences research. Relatedly, a weaker form of the construct validity problem is the variability of indicators used to represent individual differences in cognitive performance. A single task may yield many possible indicators, either direct outcomes (e.g., reaction times, accuracy) or derived metrics (e.g., efficiency, sensitivity). These issues are historical and stem from the lack of communication between experimental and correlational approaches in psychology.The challenge of hierarchical data structure has received increasing attention in recent years, and new reliability metrics tailored to cognitive tasks have been developed. These include split-half reliability and intraclass correlation coefficients (ICCs). Empirical evidence suggests that permutation-based split-half reliability demonstrates superior robustness by effectively accounting for trial-level variability and task-specific noise. For repeated measures designs, ICC(2,1) and ICC(3,1) are recommended, as they provide complementary insights into the generalizability and sample specificity of task performance. We present a practical guide for estimating the reliability of tasks with hierarchical data.The second challenge concerns the heterogeneity and arbitrariness of indicators selected from task outcomes to assess individual differences. The reliability of different indicators from the same task often varies significantly. We argue that such heterogeneity and arbitrariness arise from a lack of construct validity: the link between an indicator and the underlying cognitive construct is rarely well-defined.Given the complexity of the reliability issues in cognitive tasks, improving reliability requires multifaceted efforts. First and most importantly, construct validity should be tested and enhanced. For example, researchers may employ multi-task designs and latent modeling approaches to identify underlying constructs. Computational modeling also offers promise for more accurately capturing cognitive processes. Second, as noted in prior literature, optimizing task design can improve reliability. Strategies such as adjusting difficulty levels, increasing trial counts, incorporating gamification elements, and minimizing environmental noise can enhance measurement precision and between-subject variance. Third, new statistical models for estimating task reliability are needed. Reliability metrics that reflect the multilevel structure of task data (e.g., multilevel modeling, signal-to-noise ratio) should be more widely adopted. Finally, we recommend integrating modern psychometric frameworks, including item response theory and generalizability theory, to model error variance across trials, contexts, and individuals with greater granularity.

朱芃芃、刘铮、康春花、胡传鹏

江苏省高校哲学社会科学实验室——南京师范大 学青少年教育与智能支持实验室,南京,210024;南京师范大学心理学院,南京,210024香港中文大学(深圳)人文社科学院,深圳,518172浙江省儿童青少年心理健康与危机干预智能实验室,金华,321004江苏省高校哲学社会科学实验室——南京师范大 学青少年教育与智能支持实验室,南京,210024;南京师范大学心理学院,南京,210024

医学研究方法

认知任务信度悖论信度个体差异被试间差异

ognitive TasksReliability ParadoxReliabilityIndividual Differencesinter-individual differences

朱芃芃,刘铮,康春花,胡传鹏.认知任务的测量信度:进展与前景[EB/OL].(2025-07-30)[2025-08-31].https://chinaxiv.org/abs/202503.00257.点此复制

评论