Risk prediction of schistosomiasis infection among preschool children in Uganda using random forests compared with the logistic regression classifier
MetadataShow full item record
Introduction: Schistosomiasis is a major public health concern in many tropical and sub-tropical regions in the world including Uganda. In Uganda, the national prevalence estimates of schistosomiasis among Pre-School Children (PSC) were 31% and 41.9% in 2016 and 2017 respectively, making them the most at-risk population. Development of risk prediction models for schistosomiasis in Uganda and similar settings has not been explored yet the existing models may not be applicable due to variations in exposures across settings. Study objective: This study aimed at developing a risk prediction model for schistosomiasis infection among PSC in Uganda using random forests (RFs) and then compared its predictive accuracy with that of a logistic regression (LR) classifier. Study significance: The developed model may be used by the Vector Control Division of the Ministry of Health with support of the implementing partners as a diagnostic tool to make individual-level decisions through executing targeted interventions. Methods: National schistosomiasis prevalence survey data, on PSC, that were collected in 2016 and 2017 by the Performance Monitoring and Accountability 2020 project of Makerere University School of Public Health (MakSPH) and Johns Hopkins School of Public Health was used for this study. Using R software to analyse the data, RF and LR classification were employed to develop and make model comparison respectively. Out-of-bag error (OOB) rate, accuracy, sensitivity, specificity, precision, F measure and area under the receiver operating characteristic curve (AUROC) were used to assess and compare model appropriateness using an evaluation set. Results: The developed RF model had a 37.3% OOB error rate, 63% accuracy, 30% sensitivity, 81% specificity, 46% precision, 36% F measure and an AUROC of 0.547. It was also found that the RF slightly outperformed the LR in classifying PSC as either infected or not, though this difference was not statistically significant. Conclusions: The performance of the developed risk prediction RF model was not good enough to predict the risk of schistosomiasis infection among PSC. However, the specificity of the model was much higher than its sensitivity implying that it would better work as a diagnostic than a screening tool. The LR and RF classifiers did not differ significantly in predicting the schistosomiasis risk among PSC. Further studies should be conducted to explore better performing schistosomiasis risk predictive models as this may act as a basis.