Fast, Space-Optimal Streaming Algorithms for Clustering and Subspace Embeddings
Fast, Space-Optimal Streaming Algorithms for Clustering and Subspace Embeddings
We show that both clustering and subspace embeddings can be performed in the streaming model with the same asymptotic efficiency as in the central/offline setting. For $(k, z)$-clustering in the streaming model, we achieve a number of words of memory which is independent of the number $n$ of input points and the aspect ratio $\Delta$, yielding an optimal bound of $\tilde{\mathcal{O}}\left(\frac{dk}{\min(\varepsilon^4,\varepsilon^{z+2})}\right)$ words for accuracy parameter $\varepsilon$ on $d$-dimensional points. Additionally, we obtain amortized update time of $d\,\log(k)\cdot\text{polylog}(\log(n\Delta))$, which is an exponential improvement over the previous $d\,\text{poly}(k,\log(n\Delta))$. Our method also gives the fastest runtime for $(k,z)$-clustering even in the offline setting. For subspace embeddings in the streaming model, we achieve $\mathcal{O}(d)$ update time and space-optimal constructions, using $\tilde{\mathcal{O}}\left(\frac{d^2}{\varepsilon^2}\right)$ words for $p\le 2$ and $\tilde{\mathcal{O}}\left(\frac{d^{p/2+1}}{\varepsilon^2}\right)$ words for $p>2$, showing that streaming algorithms can match offline algorithms in both space and time complexity.
Vincent Cohen-Addad、Liudeng Wang、David P. Woodruff、Samson Zhou
计算技术、计算机技术
Vincent Cohen-Addad,Liudeng Wang,David P. Woodruff,Samson Zhou.Fast, Space-Optimal Streaming Algorithms for Clustering and Subspace Embeddings[EB/OL].(2025-04-22)[2025-06-06].https://arxiv.org/abs/2504.16229.点此复制
评论