Generating Accurate Synthetic Survival Data by Conditioning on Outcomes
Generating Accurate Synthetic Survival Data by Conditioning on Outcomes
Synthetically generated data can improve privacy, fairness, and data accessibility; however, it can be challenging in specialized scenarios such as survival analysis. One key challenge in this setting is censoring, i.e., the timing of an event is unknown in some cases. Existing methods struggle to accurately reproduce the distributions of both observed and censored event times when generating synthetic data. We propose a conceptually simple approach that generates covariates conditioned on event times and censoring indicators by leveraging existing tabular data generation models without making assumptions about the mechanism underlying censoring. Experiments on real-world datasets demonstrate that our method consistently outperforms baselines and improves downstream survival model performance.
Mohd Ashhad、Ricardo Henao
生物科学现状、生物科学发展生物科学研究方法、生物科学研究技术
Mohd Ashhad,Ricardo Henao.Generating Accurate Synthetic Survival Data by Conditioning on Outcomes[EB/OL].(2025-08-05)[2025-08-16].https://arxiv.org/abs/2405.17333.点此复制
评论