On Fine-Grained Distinct Element Estimation
On Fine-Grained Distinct Element Estimation
We study the problem of distributed distinct element estimation, where $α$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $Î\left(α\log n+\fracα{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in practice. We introduce a new parameterization based on the number $C = \fracβ{\varepsilon^2}$ of pairwise collisions, i.e., instances where the same element appears on multiple servers, and design a protocol that uses only $\mathcal{O}\left(α\log n+\frac{\sqrtβ}{\varepsilon^2} \log n\right)$ bits, breaking previous lower bounds when $C$ is small. We further improve our algorithm under assumptions on the number of distinct elements or collisions and provide matching lower bounds in all regimes, establishing $C$ as a tight complexity measure for the problem. Finally, we consider streaming algorithms for distinct element estimation parameterized by the number of items with frequency larger than $1$. Overall, our results offer insight into why statistical problems with known hardness results can be efficiently solved in practice.
Ilias Diakonikolas、Daniel M. Kane、Jasper C. H. Lee、Thanasis Pittas、David P. Woodruff、Samson Zhou
计算技术、计算机技术
Ilias Diakonikolas,Daniel M. Kane,Jasper C. H. Lee,Thanasis Pittas,David P. Woodruff,Samson Zhou.On Fine-Grained Distinct Element Estimation[EB/OL].(2025-06-27)[2025-07-22].https://arxiv.org/abs/2506.22608.点此复制
评论