面向不平衡数据分类的KFDA-Boosting算法
数据分布的不平衡性和数据特征的非线性增加了分类的困难,特别是难以识别不平衡数据中的少数类,从而影响整体的分类效果。针对该问题,结合KFDA(kernel fisher discriminant analysis)能有效提取样本非线性特征的特性和集成学习中Boosting算法的思想,提出了KFDA-Boosting算法。为了验证该算法对不平衡数据分类的有效性和优越性,以G-mean值、少数类的查准率与查全率作为分类效果的评价指标,选取了UCI中10个数据集测试KFDA-Boosting算法性能,并与支持向量机等六种分类算法进行对比实验。结果表明,对于不平衡数据分类,尤其是对不平衡度较大或呈非线性特征的数据,相比于其他分类算法,KFDA-Boosting算法能有效地识别少数类,并且在整体上具有显著的分类效果和较好的稳定性。
he imbalance of data distribution and the nonlinearity of data characteristics increase the difficulty of classification, especially the recognition of the minority class samples in the imbalanced data, thus affecting the overall classification effect. For the above problem, an algorithm called KFDA-Boosting was proposed in this paper, which combined the characteristic of KFDA , namely Kernel Fisher Discriminant Analysis, effectively extracting the samples nonlinear features and the idea of Boosting algorithm in the ensemble learning. In order to verify the effectiveness and superiority of the algorithm in the classification of imbalanced data, the paper used the G-mean value, the precision and recall of the minority class samples to evaluate the performance of classifier, and selected 10 datasets of UCI to test the KFDA-Boosting algorithm, which compared with other six algorithms, such as support vector machine. Compared with other algorithms, the results show that the algorithm can effectively identify the minority class, and has a significant effect on the classification of imbalanced data and better stability on the whole, especially for the data with larger unbalance degree or nonlinear characteristics.
杨云鹏、王来、樊重俊、袁光辉
计算技术、计算机技术
核费希尔判别分析集成学习不平衡数据分类
杨云鹏,王来,樊重俊,袁光辉.面向不平衡数据分类的KFDA-Boosting算法[EB/OL].(2018-05-02)[2025-08-11].https://chinaxiv.org/abs/201805.00056.点此复制
评论