Predicting switch to second-line antiretroviral therapy regimen: A comparison of the traditional linear classification methods and advance nonlinear machine learning algorithms

Nansereko, Brendah

dc.contributor.author	Nansereko, Brendah
dc.date.accessioned	2022-12-19T09:57:27Z
dc.date.available	2022-12-19T09:57:27Z
dc.date.issued	2022
dc.identifier.citation	Nansereko, B. (2022). Predicting switch to second-line antiretroviral therapy regimen: A comparison of the traditional linear classification methods and advance nonlinear machine learning algorithms. (Unpublished master's dissertation). Makerere University, Kampala, Uganda.	en_US
dc.identifier.uri	http://hdl.handle.net/10570/11152
dc.description	A dissertation submitted to the School of Public Health in partial fulfillment of the requirement for the Degree of Master of Biostatistics Makerere University Kampala	en_US
dc.description.abstract	Introduction: Due to changes in data patterns, self-learning approaches have been adopted in research which is commonly known as Machine Learning (ML). ML has been used previously to predict health outcomes such as early virological among others. These deterministic methodologies use a wide array of features to identify hidden patterns in the data to predict health outcomes. These methodologies incorporate chance and variation arising from fluctuations in the environment including factors not explicitly included in the model. On the other hand, the classical linear methods have been associated with several limitations such as the assumption of linearity, failure to fully incorporate heterogeneity of effects, and they are limited by the growing dimensionality of data since they cannot include so many predictors. This study is aimed to compare the linear and the nonlinear advanced classification algorithms to predict switching to ART second-line regimen. Objectives: The objective of this study was to compare the linear logistic regression analysis method which is parametric to the non-parametric advanced ML algorithms which include random forests (RF) and K nearest neighbor (KNN) machine learning algorithms to correctly classify patients switching to second-line Antiretroviral Therapy (ART) regimens. Methods: This study used secondary HIV patient data considering HIV patients from 15 HIV clinics under RHSP. We used the R, STATA, and python software for data management and analysis. The logistic regression, random forest models, and K nearest neighbor models were fitted. The models were compared by assessing the discriminative ability of the models. The models were also evaluated on the average performance metrics which included Area under Curve (AUC), sensitivity, F1 score measure, and overall accuracy. Results: The majority of the patients were females with 62.4% and most of the patients (52.1%) were aged between 20-34years at enrollment. Out of the 7818 patients, 5% had switched to a second-line ART regimen. Results from the comparison of the fitted models indicated that all x the models performed better with balanced data as compared to the imbalanced data models. The Area under Curve (AUC) for the balanced data logistic classifier 68.8% (95% CI 68.0 – 69.2) was significantly higher than the RF 56.9% (95% CI 53.4 – 58.6) and the KNN balanced data models 65.1% (95% CI 64.3 – 65.6). There was no significant statistical difference in the F1 measure for all three. However, the balanced data logistic classifier has the highest AUC and recall score as compared to the rest of the models. Conclusion: This study indicated that linear classifiers which are parametric such as logistic regression classifiers are good predictors of the switch to a second-line ART regimen with the application of appropriate resampling strategies such as Synthetic Minority Oversampling Technique (SMOTE) which balance data across classes when the data is imbalanced.	en_US
dc.language.iso	en	en_US
dc.publisher	Makerere University	en_US
dc.subject	Antiretroviral Therapy	en_US
dc.subject	Regimen	en_US
dc.subject	Traditional Linear	en_US
dc.subject	classification Methods	en_US
dc.subject	nonlinear	en_US
dc.subject	Machine Learning	en_US
dc.subject	Algorithms	en_US
dc.subject	HIV/AIDS	en_US
dc.title	Predicting switch to second-line antiretroviral therapy regimen: A comparison of the traditional linear classification methods and advance nonlinear machine learning algorithms	en_US
dc.type	Thesis	en_US

Files in this item

Name:: Nanserko-CHS-MBIO.pdf
Size:: 2.200Mb
Format:: PDF
Description:: Master's Dissertation

View/Open

This item appears in the following Collection(s)

School of Public Health (Public-Health) Collections

Show simple item record