A machine learning-based risk prediction model for type 2 diabetes mellitus among young adults in Uganda
A machine learning-based risk prediction model for type 2 diabetes mellitus among young adults in Uganda
Date
2025
Authors
Nassali, Joanita
Journal Title
Journal ISSN
Volume Title
Publisher
Makerere University
Abstract
ABSTRACT
Background: Type 2 Diabetes Mellitus (T2DM) is a growing global health concern, and its prevalence is rising among young adults in Uganda. Machine learning algorithms have demonstrated potential in predicting T2DM risk, but their application among young adults in Uganda remains limited.
Objectives: This study aimed to develop and evaluate Type 2 Diabetes Risk Prediction Models for young adults using machine learning, compare the performance of the models in relation to previous research on diabetes risk prediction, utilizing the online Diabetes Databases and to determine type 2 Diabetes risk predictors among Young Adults attending outpatient clinics from Mulago National Referral Hospital.
Methodology: This retrospective study extracted data from the medical records of young adults from outpatient Registers from Mulago National Referral Hospital. Supervised Machine learning techniques, including Naïve Bayes (NB), Random forests (RF), logistic regression (LR), support vector machines (SVM), and decision trees (DT), were applied to build the risk prediction model. performance metrics, such as accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC-ROC), were used to evaluate the model's predictive capabilities. The models were then compared with previous research on diabetes risk prediction. SHAP values were used to provide interpretability to the Logistic Regression model by quantifying the contribution of each feature to the predictions.
Results: Out of the 5 supervised machine learning-based classification predictive models studied, logistic Regression and Random Forest emerged as the most effective models, offering both high performance and interpretability. The Logistic Regression model in this study also outperformed other models, with higher AUC-ROC values than those reported by Chang et al. (0.86), Tigga and Gard et al. (0.92), and Zhu et al. (0.85). Although Random Forest performed better when comparing the study data and the online data. Key predictors of type 2 diabetes include age, Body Mass Index (BMI), and systolic and diastolic blood pressure. Higher BMI and elevated systolic blood pressure were also associated with an increased risk of developing type 2 diabetes. Also, Hypertension, Family Relationship, Family history, and cardiovascular disease exhibit very strong positive correlations with Diabetes Status.
Conclusion: This study shows that machine learning, especially Logistic Regression, is highly effective in predicting Type 2 Diabetes in young adults in Uganda. It is recommended to integrate such predictive models into routine screenings and focus public health efforts on managing BMI and blood pressure for diabetes prevention.
Description
A dissertation submitted to Makerere university in partial fulfillment of the requirements for the award of a Master’s in Health Informatics (MHI).
Keywords
Citation
Nassali,J.(Nassali, J. (2025). A machine learning-based risk prediction model for type 2 diabetes mellitus among young adults in Uganda. (Unpublished Masters Dissertation). Makerere University, Kampla Uganda