|国家预印本平台
首页|Estimating Coverage in Streams via a Modified CVM Method

Estimating Coverage in Streams via a Modified CVM Method

Estimating Coverage in Streams via a Modified CVM Method

来源:Arxiv_logoArxiv
英文摘要

When individuals in a population can be classified in classes or categories, the coverage of a sample, $C$, is defined as the probability that a randomly selected individual from the population belongs to a class represented in the sample. Estimating coverage is challenging because $C$ is not a fixed population parameter, but a property of the sample, and the task becomes more complex when the number of classes is unknown. Furthermore, this problem has not been addressed in scenarios where data arrive as a stream, under the constraint that only $n$ elements can be stored at a time. In this paper, we propose a simple and efficient method to estimate $C$ in streaming settings, based on a straightforward modification of the CVM algorithm, which is commonly used to estimate the number of distinct elements in a data stream.

Carlos Hernandez-Suarez

计算技术、计算机技术

Carlos Hernandez-Suarez.Estimating Coverage in Streams via a Modified CVM Method[EB/OL].(2025-04-06)[2025-06-08].https://arxiv.org/abs/2504.04567.点此复制

评论