Sliced-Wasserstein Distance-based Data Selection
Sliced-Wasserstein Distance-based Data Selection
We propose a new unsupervised anomaly detection method based on the sliced-Wasserstein distance for training data selection in machine learning approaches. Our filtering technique is interesting for decision-making pipelines deploying machine learning models in critical sectors, e.g., power systems, as it offers a conservative data selection and an optimal transport interpretation. To ensure the scalability of our method, we provide two efficient approximations. The first approximation processes reduced-cardinality representations of the datasets concurrently. The second makes use of a computationally light Euclidian distance approximation. Additionally, we open the first dataset showcasing localized critical peak rebate demand response in a northern climate. We present the filtering patterns of our method on synthetic datasets and numerically benchmark our method for training data selection. Finally, we employ our method as part of a first forecasting benchmark for our open-source dataset.
Julien Pallage、Antoine Lesage-Landry
输配电工程发电、发电厂计算技术、计算机技术
Julien Pallage,Antoine Lesage-Landry.Sliced-Wasserstein Distance-based Data Selection[EB/OL].(2025-04-17)[2025-06-07].https://arxiv.org/abs/2504.12918.点此复制
评论