Multirisk analysis of prostate cancer survival in Uganda
Uganda has registered the highest incidence and mortality rate from prostate cancer, with incidence rate standing at 37.0 per 100,000, an annual increment rate of 5.2% and mortality: incidence ratio of 71%. This study investigated the effects of prostate cancer risk factors on prostate cancer patient’s survival and predicted prostate cancer survival using the model developed. A retrospective review of 150 prostate cancer patients records who registered between 1st January and 31st December in 2013 at the Uganda Cancer Institute and met the eligibility criteria was carried out. A total of 16 features were extracted from each patient, risk factor analysis was done on age at diagnosis, cancer stage at diagnosis, treatment option, smoking habits, alcohol abuse, family history, income level, obesity and HIV status among others. Three machine learning classifiers’ Logistic Regression, Na¨ıve Bayes and KNN were used to build the models for survival prediction. The dataset was split into a 70% training dataset and30% for testing and evaluation purposes. A random state of zero was used. This study showed that a larger proportion of the patients were diagnosed at an advanced and intermediate stage, surgery was the most common treatment among most patients. 131 patients were smokers,41 had a family history,59 patients were HIV negative,63 had low income and 109 were non-obese. Overall one-year survival rate was 43%. Survival was 69%,63%,43%, 37% and 29% for men in age brackets of 40-49,50- 59,60-69,70-79 ,80-89 respectively. Non-obese (55%), High-income earners (69%), Master and above 69%,non-family history 51%, smokers (47%), infant disease (67%, and treated with hormones either alone or in combination 53% had a better survival rate. Best survival outcomes were registered in younger patients, high-income earners, patients with Infant disease, non-obese, and in patients without a family history of prostate cancer Survival duration, Income, Obesity, Cancer stage, and Age were the best predictors. Logistic Regression model had the precision of, 84% recall 84%, f1-score of 84% and an area under ROC curve of 81%, Gaussian Navies Bayes model had a precision of 83%, recall of 82% f1-scores of 82% area under ROC curve of 81%. KNN precision of 66%, recall of 64%, f1-score 65% and area under ROC curve of 60%. Logistic regression outperformed both Navies and KNN. Logistic regression is, therefore, the best model for predicting the survival outcomes of prostate cancer patients.