|国家预印本平台
首页|SMOGAN: Synthetic Minority Oversampling with GAN Refinement for Imbalanced Regression

SMOGAN: Synthetic Minority Oversampling with GAN Refinement for Imbalanced Regression

SMOGAN: Synthetic Minority Oversampling with GAN Refinement for Imbalanced Regression

来源:Arxiv_logoArxiv
英文摘要

Imbalanced regression refers to prediction tasks where the target variable is skewed. This skewness hinders machine learning models, especially neural networks, which concentrate on dense regions and therefore perform poorly on underrepresented (minority) samples. Despite the importance of this problem, only a few methods have been proposed for imbalanced regression. Many of the available solutions for imbalanced regression adapt techniques from the class imbalance domain, such as linear interpolation and the addition of Gaussian noise, to create synthetic data in sparse regions. However, in many cases, the underlying distribution of the data is complex and non-linear. Consequently, these approaches generate synthetic samples that do not accurately represent the true feature-target relationship. To overcome these limitations, we propose SMOGAN, a two-step oversampling framework for imbalanced regression. In Stage 1, an existing oversampler generates initial synthetic samples in sparse target regions. In Stage 2, we introduce DistGAN, a distribution-aware GAN that serves as SMOGAN's filtering layer and refines these samples via adversarial loss augmented with a Maximum Mean Discrepancy objective, aligning them with the true joint feature-target distribution. Extensive experiments on 23 imbalanced datasets show that SMOGAN consistently outperforms the default oversampling method without the DistGAN filtering layer.

Shayan Alahyari、Mike Domaratzki

计算技术、计算机技术

Shayan Alahyari,Mike Domaratzki.SMOGAN: Synthetic Minority Oversampling with GAN Refinement for Imbalanced Regression[EB/OL].(2025-04-29)[2025-05-22].https://arxiv.org/abs/2504.21152.点此复制

评论