|国家预印本平台
首页|解读不显著结果:基于500个实证研究的量化分析

解读不显著结果:基于500个实证研究的量化分析

Interpreting Nonsignificant Results: A Quantitative Investigation Based on 500 Chinese Psychological Research

中文摘要英文摘要

不显著结果(p > 0.05)在心理学研究中十分常见。但不显著结果容易被误解为接受零假设的证据,可能对分组匹配进行错误推断或者忽视被小样本的不显著结果掩盖的真实效应。但国内目前尚无实证研究对不显著结果的普遍性及其解读进行调查。本研究调查500篇中文心理学实证研究,评估其摘要中出现阴性陈述的频率,判断基于阴性陈述的推断是否准确,并使用贝叶斯因子对不显著结果中包含t值的研究进行重新评估。结果表明,36%的摘要提及不显著结果,共包含236个阴性陈述;41%的阴性陈述对不显著结果的解读出现偏差(如,解读为支持了零假设);对包含t值的研究的贝叶斯因子分析表明,仅5.1%的不显著结果可以提供强证据支持零假设(BF01 > 10)。与先前对国际心理学期刊的调查结果相比(30%的摘要包含阴性陈述;70%的阴性陈述对不显著结果的解读有误),中文心理学期刊中报告不显著结果的比例以及对不显著结果的解读正确率均更高。但国内研究者仍需进一步加强对不显著结果的认识,推广适于评估不显著结果的统计方法。

Background: P-value is the most widely used statistical index for inference in science. A p value greater than 0.05, i.e., nonsignificant results, however, cannot distinguish the two following situations: the absence of evidence or the evidence of absence. Unfortunately, researchers in psychological science may not be able to interpret p-value correctly, resulting in possible mistakes in statistical inference based on nonsignificant result. Indeed, Aczel et al (2019) surveyed three empirical studies published in Psychonomic Bulletin & Review, Journal of Experimental Psychology: General, and Psychological Science. They found that about 72% of nonsignificant results were misinterpreted as evidence in favor of the null hypothesis. The misinterpretation of nonsignificant results may lead severe consequences. One such consequence is the dismay of the nonsignificant results as null effect, ignoring the small but meaningful effects (e.g., Jia, et al., 2018). More importantly, misintepreted non-signficant results when comparing certain traits (e.g., age, gender) in matched-group clinical trials may creat a false matched group, thus render the effect of intervention meaningless. As psychological science keeps growing in China, it is important to estimate how nonsignificant results were interpreted in the empirical studies published in Chinese Journals. However, no such meta-research has been done. To fill the gap, we surveyed 500 empirical papers published in five important Chinese psychological journals, to explore the following questions: (1) how often are nonsignificant results reported, that is, how severe is the publication bias; (2) how do researchers interpret nonsignificant results in their own studies; (3) if researcher interpreted nonsignificant as evidence for absence, does empirical data provide enough support the null effect. Method: Based on our pre-registration (https://osf.io/czx6f), we randomly selected empirical research papers published in 2017 and 2018 in five Chinese prominent journals (Acta Psychologica Sinica, Psychological Science, Chinese Journal of Clinical Psychology, Psychological Development and Education, Psychological and Behavioral Studies). First, according to the publication volume of each journal, we randomly selected 500 empirical research. Secondly, we screened the abstracts of the selected articles and judged whether they contained negative statements. Thirdly, we categorized each negative statement into 4 categories (Correct-frequentist, Incorrect-frequentist: whole population, Incorrect-frequentist: current sample, Difficult to judge). Finally, we calculated Bayes factors based on the t values and sample size associated with the nonsignificant results to investigate whether empirical data provide enough evidence in favor of null hypothesis. Results: Our survey revealed that: (1) out of 500 empirical research, 36% of their abstracts (n = 180) mentioned nonsignificant results; (2) there were 236 negative statements in the article that referred to nonsignificant results in abstracts, and 41% negative statements misinterpreted nonsignificant results, i.e., the authors inferred that the results provided evidence for the absence of effects; (3) 5.1% (n = 2) nonsignificant results can provide strong evidence in favor of null hypothesis (BF01 > 10). Compared with the results from Aczel et al (2019), we found that empirical papers published in Chinese journal reported more nonsignificant results (36% vs. 32%), and researchers make fewer misinterpretation based on nonsignificant results (41% vs. 72%). It worth noting that there exists a categorization of ambiguous statements about nonsignificant results in the Chinese context: there is no significant difference between condition A and condition B. This statement has two interpretations: it can be interpreted as a different way to say statistically nonsignificant, or as there is no differences between condition A and condition B. The percentage of misinterpretation of nonsignificant results raised to 61% if we used the second interpretation, instead of 41% when we use the first interpretation.Conclusion: The results suggest that Chinese researchers need to enhance their understanding of nonsignificant results and use more appropriate statistical methods to extract information from non-significant results. Also, more precise wording should be used in the Chinese context. "

10.12074/202003.00056V1

科学、科学研究

不显著结果零假设显著性检验贝叶斯因子元研究

Nonsignificant results Null-hypothesis significance testing Bayes factors Meta-research.

.解读不显著结果:基于500个实证研究的量化分析[EB/OL].(2020-03-22)[2025-06-28].https://chinaxiv.org/abs/202003.00056.点此复制

评论