Statistical parametric simulation studies based on real data
Statistical parametric simulation studies based on real data
Simulation studies are indispensable for evaluating and comparing statistical methods. The most common simulation approach is parametric simulation, where the data-generating mechanism (DGM) corresponds to a predefined parametric model from which observations are drawn. Many statistical simulation studies aim to provide practical recommendations on a method's suitability for a given application; however, parametric simulations in particular are frequently criticized for being too simplistic and not reflecting reality. To overcome this drawback, it is generally considered a sensible approach to employ real data for constructing the parametric DGMs. However, while the concept of real-data-based parametric DGMs is widely recognized, the specific ways in which DGM components are inferred from real data vary, and their implications may not always be well understood. Additionally, researchers often rely on a limited selection of real datasets, with the rationale for their selection often unclear. This paper addresses these issues by formally discussing how components of parametric DGMs can be inferred from real data and how dataset selection can be performed more systematically. By doing so, we aim to support researchers in conducting simulation studies with a lower risk of overgeneralization and misinterpretation. We illustrate the construction of parametric DGMs based on a systematically selected set of real datasets using two examples: one on ordinal outcomes in randomized controlled trials and one on differential gene expression analysis.
Christina Sauer、F. Julian D. Lange、Maria Thurow、Ina Dormuth、Anne-Laure Boulesteix
数学
Christina Sauer,F. Julian D. Lange,Maria Thurow,Ina Dormuth,Anne-Laure Boulesteix.Statistical parametric simulation studies based on real data[EB/OL].(2025-04-07)[2025-05-06].https://arxiv.org/abs/2504.04864.点此复制
评论