|国家预印本平台
首页|On feature selection in double-imbalanced data settings: a Random Forest approach

On feature selection in double-imbalanced data settings: a Random Forest approach

On feature selection in double-imbalanced data settings: a Random Forest approach

来源:Arxiv_logoArxiv
英文摘要

Feature selection is a critical step in high-dimensional classification tasks, particularly under challenging conditions of double imbalance, namely settings characterized by both class imbalance in the response variable and dimensional asymmetry in the data $(n \gg p)$. In such scenarios, traditional feature selection methods applied to Random Forests (RF) often yield unstable or misleading importance rankings. This paper proposes a novel thresholding scheme for feature selection based on minimal depth, which exploits the tree topology to assess variable relevance. Extensive experiments on simulated and real-world datasets demonstrate that the proposed approach produces more parsimonious and accurate subsets of variables compared to conventional minimal depth-based selection. The method provides a practical and interpretable solution for variable selection in RF under double imbalance conditions.

Fabio Demaria

计算技术、计算机技术

Fabio Demaria.On feature selection in double-imbalanced data settings: a Random Forest approach[EB/OL].(2025-06-12)[2025-07-16].https://arxiv.org/abs/2506.10929.点此复制

评论