|国家预印本平台
首页|A Conditional GAN for Tabular Data Generation with Probabilistic Sampling of Latent Subspaces

A Conditional GAN for Tabular Data Generation with Probabilistic Sampling of Latent Subspaces

A Conditional GAN for Tabular Data Generation with Probabilistic Sampling of Latent Subspaces

来源:Arxiv_logoArxiv
英文摘要

The tabular form constitutes the standard way of representing data in relational database systems and spreadsheets. But, similarly to other forms, tabular data suffers from class imbalance, a problem that causes serious performance degradation in a wide variety of machine learning tasks. One of the most effective solutions dictates the usage of Generative Adversarial Networks (GANs) in order to synthesize artificial data instances for the under-represented classes. Despite their good performance, none of the proposed GAN models takes into account the vector subspaces of the input samples in the real data space, leading to data generation in arbitrary locations. Moreover, the class labels are treated in the same manner as the other categorical variables during training, so conditional sampling by class is rendered less effective. To overcome these problems, this study presents ctdGAN, a conditional GAN for alleviating class imbalance in tabular datasets. Initially, ctdGAN executes a space partitioning step to assign cluster labels to the input samples. Subsequently, it utilizes these labels to synthesize samples via a novel probabilistic sampling strategy and a new loss function that penalizes both cluster and class mis-predictions. In this way, ctdGAN is trained to generate samples in subspaces that resemble those of the original data distribution. We also introduce several other improvements, including a simple, yet effective cluster-wise scaling technique that captures multiple feature modes without affecting data dimensionality. The exhaustive evaluation of ctdGAN with 14 imbalanced datasets demonstrated its superiority in generating high fidelity samples and improving classification accuracy.

Leonidas Akritidis、Panayiotis Bozanis

计算技术、计算机技术

Leonidas Akritidis,Panayiotis Bozanis.A Conditional GAN for Tabular Data Generation with Probabilistic Sampling of Latent Subspaces[EB/OL].(2025-08-01)[2025-08-11].https://arxiv.org/abs/2508.00472.点此复制

评论