首页|Benchmarking Differentially Private Tabular Data Synthesis

Benchmarking Differentially Private Tabular Data Synthesis

来源：

英文摘要

Differentially private (DP) tabular data synthesis generates artificial data that preserves the statistical properties of private data while safeguarding individual privacy. The emergence of diverse algorithms in recent years has introduced challenges in practical applications, such as inconsistent data processing methods, lack of in-depth algorithm analysis, and incomplete comparisons due to overlapping development timelines. These factors create significant obstacles to selecting appropriate algorithms. In this paper, we address these challenges by proposing a benchmark for evaluating tabular data synthesis methods. We present a unified evaluation framework that integrates data preprocessing, feature selection, and synthesis modules, facilitating fair and comprehensive comparisons. Our evaluation reveals that a significant utility-efficiency trade-off exists among current state-of-the-art methods. Some statistical methods are superior in synthesis utility, but their efficiency is not as good as most machine learning-based methods. Furthermore, we conduct an in-depth analysis of each module with experimental validation, offering theoretical insights into the strengths and limitations of different strategies.

作者：Kai Chen、Xiaochen Li、Chen Gong、Ryan McKenna、Tianhao Wang

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Kai Chen,Xiaochen Li,Chen Gong,Ryan McKenna,Tianhao Wang.Benchmarking Differentially Private Tabular Data Synthesis[EB/OL].(2025-04-18)[2025-04-29].https://arxiv.org/abs/2504.14061.点此复制

Benchmarking Differentially Private Tabular Data Synthesis

Benchmarking Differentially Private Tabular Data Synthesis

评论