|国家预印本平台
首页|失衡数据集分类技术研究进展

失衡数据集分类技术研究进展

Research Progress on Classification Technology of Imbalanced Data Sets

中文摘要英文摘要

失衡数据集是数据挖掘和机器学习领域中客观存在的一种数据形态,在现实世界的诸多领域中都有着广泛的应用。本文介绍了失衡数据集的特点,并从数据分布和决策空间两方面详细的分析了失衡数据在分类时所面临的问题。本文还从数据重采样和分类算法改进两个方面阐述了提高分类器性能的方法,介绍了目前国内外专家学者解决失衡数据分类问题的主要方法和策略。并对比传统的分类器性能评价指标,介绍了适用于失衡数据集的分类器性能评价指标。

Imbalanced data sets (IDS) is an objective existence data form in the field of data mining and machine learning, many areas of the real world has extensive application. This paper introduced the characteristics of imbalanced data sets, and detailed analysis of the imbalanced data classification problems faced form two aspects of data distribution and decision space. This paper also described the method to improve the classifier performance from the two aspects of data resampling and classification algorithm improving, introduced the main methods and strategy experts and scholars from various countries to solve the problem of IDS classification. Compared with the traditional classifier performance measure and introduced applicable measure to evaluate the IDS classification performance.

孙渤禹、李鹏、黄久玲

计算技术、计算机技术

失衡数据集分类问题数据重采样

Imbalanced data setslassification problemata resampling

孙渤禹,李鹏,黄久玲.失衡数据集分类技术研究进展[EB/OL].(2013-12-02)[2025-08-24].http://www.paper.edu.cn/releasepaper/content/201312-19.点此复制

评论