Predicting long-term multicategory cause of death in patients with prostate cancer: random forest versus multinomial model
Predicting long-term multicategory cause of death in patients with prostate cancer: random forest versus multinomial model
Abstract IMPORTANCEPatients with prostate cancer more likely die of non-cancer cause of death (COD) than prostate cancer. It is thus important to accurately predict COD more precisely in these patients. Random forest, a model of machine learning, was useful for predicting binary cancer-specific deaths. However, its accuracy for predicting multi-category COD in prostate cancer patients is unclear. OBJECTIVETo develop and tune a machine-learning model for predicting 6-category COD in prostate cancer patients DESIGN, SETTING, AND PARTICIPANTSWe included patients in Surveillance, Epidemiology, and End Results-18 cancer registry-program with prostate cancer diagnosed in 2004 (followed up through 2016). They were randomly and equally divided into training and testing sets. We evaluated the prediction accuracies of random forest and conventional-statistical/multinomial models for 6-category COD in primary and cross validation processes and by data-encoding types. EXPOSURETumor and patient characteristics MAIN OUTCOMES AND MEASURES13-year 6-category COD RESULTSAmong 49,864 men with prostate cancer, 29,611 (59.4%) were alive at the end of follow-up, and 5,448 (10.9%) died of cardiovascular disease, 4,607 (9.2%) of prostate cancer, 3,681 (7.4%) of Non-Prostate cancer, 717 (1.4%) of infection, and 5,800 (11.6%) of other causes. We predicted 6-category COD among these patients with a mean accuracy of 59.1% (n=240, 95% CI, 58.7%-59.4%) in the random forest models with one-hot encoding, and 50.4% (95% CI, 49.7%-51.0%) in the multinomial models. Tumor characteristics, prostate-specific antigen level, and diagnosis confirmation-method were important in random forest and multinomial models. In random forest models, no statistical differences were found between accuracies of primary versus cross validation, and those of conventional versus one-hot encoding. CONCLUSIONFor prostate cancer patients, we developed a random forest model that has an accuracy of 59.1% in predicting long-term 6-category COD. It outperforms conventional-statistical/multinomial models with an absolute prediction-accuracy difference of 8.7%.
Deng Fei、Zeng Fuqing、Wang Jianwei、Shanahan Andrew J.、Zhang Lanjing
School of Electrical and Electronic Engineering, Shanghai Institute of TechnologyDepartment of Urology, Wuhan Union Hospital of Tongji Medical Collage, Huazhong University of Science and TechnologyDepartment of Urology, Beijing Jishuitan Hospital, the Fourth Medical College of Peking UniversityDepartment of Medicine, Princeton Medical CenterDepartment of Pathology, Princeton Medical Center||Department of Biological Sciences, Rutgers University||Rutgers Cancer Institute of New Jersey||Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University
肿瘤学医学研究方法
Prostate cancercause-specific mortalitymachine learningpredictionprognosis
Deng Fei,Zeng Fuqing,Wang Jianwei,Shanahan Andrew J.,Zhang Lanjing.Predicting long-term multicategory cause of death in patients with prostate cancer: random forest versus multinomial model[EB/OL].(2025-03-28)[2025-05-08].https://www.biorxiv.org/content/10.1101/2020.01.03.893966.点此复制
评论