首页|Dissimilar Batch Decompositions of Random Datasets

Dissimilar Batch Decompositions of Random Datasets

来源：

英文摘要

For better learning, large datasets are often split into small batches and fed sequentially to the predictive model. In this paper, we study such batch decompositions from a probabilistic perspective. We assume that data points (possibly corrupted) are drawn independently from a given space and define a concept of similarity between two data points. We then consider decompositions that restrict the amount of similarity within each batch and obtain high probability bounds for the minimum size. We demonstrate an inherent tradeoff between relaxing the similarity constraint and the overall size and also use martingale methods to obtain bounds for the maximum size of data subsets with a given similarity.

作者：Ghurumuruhan Ganesan

作者单位：

学科分类：计算技术、计算机技术

推荐引用：Ghurumuruhan Ganesan.Dissimilar Batch Decompositions of Random Datasets[EB/OL].(2025-04-09)[2025-04-30].https://arxiv.org/abs/2504.06991.点此复制

Dissimilar Batch Decompositions of Random Datasets

Dissimilar Batch Decompositions of Random Datasets

评论