|国家预印本平台
| 注册
首页|Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference

来源:Arxiv_logoArxiv
英文摘要

In light of the rapidly growing large-scale data in federated ecosystems, the traditional principal component analysis (PCA) is often not applicable due to privacy protection considerations and large computational burden. Algorithms were proposed to lower the computational cost, but few can handle both high dimensionality and massive sample size under distributed settings. In this paper, we propose the FAst DIstributed (FADI) PCA method for federated data when both the dimension $d$ and the sample size $n$ are ultra-large, by simultaneously performing parallel computing along $d$ and distributed computing along $n$. Specifically, we utilize $L$ parallel copies of $p$-dimensional fast sketches to divide the computing burden along $d$ and aggregate the results distributively along the split samples. We present a general framework applicable to multiple statistical problems, and establish comprehensive theoretical results under the general framework. We show that FADI accelerates the computation while enjoying the same non-asymptotic error rate as the traditional PCA when $Lp \ge d$. We also derive inferential results that characterize the asymptotic distribution of FADI, and show a phase-transition phenomenon as $Lp$ increases. We perform extensive simulations to empirically validate our theoretical findings, and apply FADI to the 1000 Genomes data to study the population structure.

Shuting Shen、Junwei Lu、Xihong Lin

计算技术、计算机技术

Shuting Shen,Junwei Lu,Xihong Lin.Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference[EB/OL].(2025-08-26)[2025-09-06].https://arxiv.org/abs/2306.06857.点此复制

评论