Developing a prediction model to detect the likelihood of early-stage breast cancer using machine learning techniques at the Uganda Cancer Institute
MetadataShow full item record
Background: Breast cancer is the most common malignancy affecting women worldwide, with over one million cases occurring annually. It is the second most common cause of cancer-related death in the world. In Uganda, it is the second most common cancer among women after cancer of the cervix. Most women present after developing incurable and metastatic tumors. Although early diagnosis increases the chances of survival, screening facilities are limited. With advanced computing technology, Machine Learning (ML) has been extended in the biomedical field to diagnose various health outcomes. Therefore, this study aimed to develop a web-based application to predict a woman's likelihood of developing breast cancer using machine learning. Methods: This was a retrospective study that involved retrieval and review of 1897 patients' files with 22 variables at the Uganda Cancer Institute. A six-stage Cross Industry Standard Process for Data Mining (CRISP-DM) methodology was adopted and applied on twenty-five different classification algorithms using the Weka tool. The classifier categories used included; Bayes, Mata, Functions, Lazy Trees, and rules classifiers. Models were quantified and compared based on performance.The best performing model was integrated into a web application to make predictions on breast cancer. Results: The experimental results showed that random forest and Logistic Model Tree had comparable results. However, when models were further evaluated on the accuracy, F-score and ROC curve metrics using 10-fold cross-validation (CV) analysis, random forest outperformed other models with (99.68%, 0.997 and 1.0) for the respective metrics while LMT had (99.47%, 0.995 and 0.997) for the same performance metrics. Tree classifier had a better performance than other classifiers since Random forest and LMT algorithms were from this classifier. Random forest algorithm was integrated into a web application to enhance screening of women at risk of developing breast cancer. Conclusions: ML techniques are essential in the medical field because they enhance early identification of high-risk individual based on known clinical risk factors. Therefore, random forest model can be integrated into health care to help health workers during breast cancer patient management and while assigning a therapy.