Responsible disease modeling and prediction of Cardiovascular diseases
Responsible disease modeling and prediction of Cardiovascular diseases
Date
2025
Authors
Mbabazi, Elizabeth Shirley
Journal Title
Journal ISSN
Volume Title
Publisher
Makerere University
Abstract
The increased number of deaths of cardiovascular diseases among people in both Low- and Middle-Income countries and in developed countries is alarming. There are Machine Learning (ML) models that have been developed for early diagnosis of cardiovascular diseases, however, their success is low due to the black box nature of the models and the trust among the cardiovascular diseases’ health experts is low thus hindering the models’ acceptance. This research study focused on developing explainable AI models to predict the likelihood of acquiring CVDs over a period of ten years from which the best performing one will be chosen. The open cardiovascular disease study dataset was used to develop multiple Machine learning models i.e K-Nearest Neighbors (KNN), Logistic Regression, XGBoost, Catboost, Random Forest, Naive Bayes, Ada Boost, Support Vector Machine, Gradient Boosting Machine, Long Short-Term Memory and Decision tree. The models’ performance was assessed using F1-score, Accuracy, Area Under the Curve, Precision, Recall, sensitivity, specificity and the confusion matrix metrics. From this study, it is observed that the Random Forest model performs better than the other models with an accuracy of 98% followed by XGBoost with 89% and KNN with 88%. Explainable AI techniques (XAI) like SHapley Additive exPlanations (SHAP) explainable technique, Partial Dependence Plots (PDP), Individual Conditional Expectations (ICE) and Local Interpretable Model-agnostic Explanations (LIME), were later applied to all the models to understand how they came to their prediction thus breaking the black box nature of Machine learning models. This research contributes to the identification of cardiovascular diseases risk factors with the use of feature learning and XAI for the early diagnosis of cardiovascular diseases thus aiding in early intervention. The leading risk factors that were established as per the models’ predictions are Age,sex, systolic Blood pressure(SysBP) and Cigarettes per day whereas diabetes, total cholesterol, Blood Pressure Medication (BPMeds) and prevalent Stroke are the least contributing risk factors implying they are not as important in acquiring CVDs.
Description
A dissertation submitted to the Directorate of Research and Graduate Training in partial fulfilment of the requirements for the Degree of Master of Science in Computer Science (Track: Data Science & Artificial Intelligence) of Makerere University.
Keywords
Citation
Mbabazi, E. S. (2025). Responsible disease modeling and prediction of Cardiovascular diseases (Unpublished master’s dissertation). Makerere University, Kampala, Uganda.