|国家预印本平台
首页|NOMAD Projection

NOMAD Projection

NOMAD Projection

来源:Arxiv_logoArxiv
英文摘要

The rapid adoption of generative AI has driven an explosion in the size of datasets consumed and produced by AI models. Traditional methods for unstructured data visualization, such as t-SNE and UMAP, have not kept up with the pace of dataset scaling. This presents a significant challenge for AI explainability, which relies on methods such as t-SNE and UMAP for exploratory data analysis. In this paper, we introduce Negative Or Mean Affinity Discrimination (NOMAD) Projection, the first method for unstructured data visualization via nonlinear dimensionality reduction that can run on multiple GPUs at train time. We provide theory that situates NOMAD Projection as an approximate upper bound on the InfoNC-t-SNE loss, and empirical results that demonstrate NOMAD Projection's superior performance and speed profile compared to existing state-of-the-art methods. We demonstrate the scalability of NOMAD Projection by computing the first complete data map of Multilingual Wikipedia.

Brandon Duderstadt、Zach Nussbaum、Laurens van der Maaten

计算技术、计算机技术

Brandon Duderstadt,Zach Nussbaum,Laurens van der Maaten.NOMAD Projection[EB/OL].(2025-05-21)[2025-06-15].https://arxiv.org/abs/2505.15511.点此复制

评论