|国家预印本平台
首页|Bounding User Contributions for User-Level Differentially Private Mean Estimation

Bounding User Contributions for User-Level Differentially Private Mean Estimation

Bounding User Contributions for User-Level Differentially Private Mean Estimation

来源:Arxiv_logoArxiv
英文摘要

We revisit the problem of releasing the sample mean of bounded samples in a dataset, privately, under user-level $\varepsilon$-differential privacy (DP). We aim to derive the optimal method of preprocessing data samples, within a canonical class of processing strategies, in terms of the error in estimation. Typical error analyses of such \emph{bounding} (or \emph{clipping}) strategies in the literature assume that the data samples are independent and identically distributed (i.i.d.), and sometimes also that all users contribute the same number of samples (data homogeneity) -- assumptions that do not accurately model real-world data distributions. Our main result in this work is a precise characterization of the preprocessing strategy that gives rise to the smallest \emph{worst-case} error over all datasets -- a \emph{distribution-independent} error metric -- while allowing for data heterogeneity. We also show via experimental studies that even for i.i.d. real-valued samples, our clipping strategy performs much better, in terms of \emph{average-case} error, than the widely used bounding strategy of Amin et al. (2019).

V. Arvind Rameshwar、Anshoo Tandon

计算技术、计算机技术

V. Arvind Rameshwar,Anshoo Tandon.Bounding User Contributions for User-Level Differentially Private Mean Estimation[EB/OL].(2025-06-27)[2025-07-18].https://arxiv.org/abs/2502.04749.点此复制

评论