基于LDA和XGBoost算法的乳腺癌预测模型构建研究
Research on prediction model of breast cancer based on LDA and XGBoost algorithm
乳腺癌是女性癌症死亡的主要原因,并且男性乳腺癌患者的人数也不可轻视,因此运用信息技术预测病情是提高疾病诊断率的重要途径。本实验对kaggle数据库提供的乳腺癌数据集的多项指标特征进行降维处理,分析了498组30维乳腺癌患者的医学检验指标,采用线性判别式分析方法(linear discriminant analysis,LDA)合并特征属性,将数据投影至低维度空间,并提出极端梯度提升算法(eXtreme Gradient Boosting,Xgboost),借用网格搜索进行交叉验证获得最优参数构建XGBoost预测模型,同时以Adaboost,随机森林,朴素贝叶斯算法作为性能比较分类器;实验结果表明,降维处理后训练的预测模型分类准确率比降维前平均高出2.7%,其中XGBoost构建的预测模型分类效果最佳达到了98.7%。
Breast cancer is the leading cause of cancer death in women, and the number of male breast cancer patients can not be ignored. Therefore, using information technology to predict the disease is an important way to improve the rate of disease diagnosis. This experiment carries out dimension reduction to the multi index characteristics of the breast cancer dataset provided by the kaggle database, analyzes the medical test indexes of the 498 groups of 30 dimensional breast cancer patients, uses the linear discriminant analysis (LDA) to merge the characteristic attributes, and projects the data to the low dimensional space, and proposes the extreme gradient lifting algorithm (eXtreme Gradient Boosting). Xgboost), which uses grid search for cross validation to obtain the optimal parameters, constructs xgboost prediction model, and uses AdaBoost, random forest and naive Bayes algorithm as performance comparison classifiers; The experimental results show that the classification accuracy of the prediction model trained after dimensionality reduction is 2.7% higher than that before dimensionality reduction, and the classification effect of the prediction model constructed by xgboost is the best, reaching 98.7%.
郭志恒、阮旭凌、刘琦、晏峻峰
医学研究方法肿瘤学
乳腺癌降维LDAXGBoost分类
breast cancerimension reductionLDAXGBoostclassification
郭志恒,阮旭凌,刘琦,晏峻峰.基于LDA和XGBoost算法的乳腺癌预测模型构建研究[EB/OL].(2021-09-15)[2025-05-04].https://www.biomedrxiv.org.cn/article/doi/bmr.202106.00007.点此复制
评论