|国家预印本平台
首页|The effects of data adequacy and calibration size on the accuracy of presence-only species distribution models

The effects of data adequacy and calibration size on the accuracy of presence-only species distribution models

The effects of data adequacy and calibration size on the accuracy of presence-only species distribution models

来源:bioRxiv_logobioRxiv
英文摘要

ABSTRACT Presence-only data used to develop species distribution models are often biased towards areas that are frequently surveyed. Furthermore, the size of calibration area with respect to the area covered by the species occurrences has been shown to affect model accuracy. However, existing assessments of the effect of data inadequacy and calibration size on model accuracy have predominately been conducted using empirical studies. These studies can give ambiguous results, since the data used to train and test the model can both be biased.These limitations were addressed by applying simulated data to assess how inadequate data coverage and the size of calibration area affect the accuracy of species distribution models generated by MaxEnt and BIOCLIM. The validity of four presence-only performance measures, Contrast Validation Index (CVI), Boyce index, AUC and AUCratio, was also assessed.CVI, AUC and AUCratio ranked the accuracy of univariate models correctly according to the true importance of their defining environmental variable, a desirable property of an accuracy measure. Contrastingly, Boyce index failed to rank the accuracy of univariate models correctly and a high percentage of irrelevant variables produced models with a high Boyce index.Inadequate data coverage and increased calibration area reduced model accuracy by reducing the correct identification of the dominant environmental determinant. BIOCLIM outperformed MaxEnt models in predicting the true distribution of simulated species with a symmetric dominant response. However, MaxEnt outperformed BIOCLIM in predicting the true distribution of simulated species with skew and linear dominant responses. Despite this, the standard performance measures consistently overestimated the performance of MaxEnt models and showed them as always having higher model accuracy than the BIOCLIM models.It has been acknowledged that research should be directed towards testing and improving species distribution modelling tools, particularly how to handle the inevitable bias and scarcity of species occurrence data. Simulated data, as demonstrated here, provides a powerful approach to comprehensively test the performance of modelling tools and to disentangle the effects of data properties and modelling options on model accuracy. This may be impossible to achieve using real-world data.

Santika Truly、Wilson Kerrie A.、Hutchinson Michael F.

The University of Queensland, School of Biological Sciences||ARC Centre of Excellence for Environmental Decisions (CEED), The University of QueenslandThe University of Queensland, School of Biological Sciences||ARC Centre of Excellence for Environmental Decisions (CEED), The University of QueenslandFenner School of Environment and Society, The Australian National University

10.1101/775700

环境生物学生物科学研究方法、生物科学研究技术环境科学理论

AUCAUCratioBIOCLIMBoyce indexcalibration sizeContrast Validation Index (CVI)MaxEntmodel accuracypresence-only data

Santika Truly,Wilson Kerrie A.,Hutchinson Michael F..The effects of data adequacy and calibration size on the accuracy of presence-only species distribution models[EB/OL].(2025-03-28)[2025-04-24].https://www.biorxiv.org/content/10.1101/775700.点此复制

评论