Enhancing outbreak surveillance through integration of natural language processing in Uganda’s electronic integrated disease surveillance and response system.
Enhancing outbreak surveillance through integration of natural language processing in Uganda’s electronic integrated disease surveillance and response system.
Date
2026-01-13
Authors
Nakitandwe, Rebecca,Melisa.
Journal Title
Journal ISSN
Volume Title
Publisher
Makerere University.
Abstract
Introduction: Early detection of diseases or infections is essential to prevent infectious diseases from escalating into large outbreaks. In Uganda, the Electronic Integrated Disease Surveillance and Response (eIDSR) system enables community-level reporting of suspected cases via SMS. However, manual processing of these unstructured messages often delays outbreak detection and response, particularly during high-volume reporting periods. The manual processing of incoming SMS messages within the eIDSR system creates a bottleneck that hinders timely outbreak detection and response. This delay has the potential to increase morbidity and mortality, especially in resource-limited settings. This study aimed to integrate Natural Language Processing (NLP) to automate the extraction of key information, such as disease type, location, and symptoms, from SMS alerts submitted to the eIDSR system. It also sought to understand the contextual factors that influenced model accuracy and performance.
Methods: A retrospective design was employed using historical SMS data submitted to the eIDSR system in 2024. A Bidirectional Encoder Representations from Transformers (BERT)-uncased model was fine-tuned on a manually annotated dataset to support named entity recognition. The model was evaluated using precision, recall, F1-score, and processing speed, and its performance was compared with manual extraction. McNemar’s test was used to assess the statistical significance of differences between the two methods.
Results: The model achieved an F1-score of 92.6%, with recall of 94.2% and precision of 91.1%, processing approximately 48 messages per second. It extracted high-value entities such as disease, age, gender, and location, with near-perfect accuracy. Errors were concentrated around symptom span boundaries and ambiguous entries. Interviews confirmed the value of automation for reducing analyst workload and outlined key limitations of the current manual workflow, including handling of ambiguous or duplicate messages.
Conclusion: This study demonstrated the feasibility of applying NLP to automate SMS-based disease surveillance within Uganda’s eIDSR system. Although human review remains necessary for edge cases, the model showed strong potential to accelerate processing, eliminate backlog, and support timely response under frameworks like 7-1-7. With targeted improvements especially in symptom handling and multilingual input. The model would be suitable for pilot integration under a human-in-the-loop deployment model.
Description
A dissertation submitted to the school of public health in partial fulfilment of the requirements for the award of the degree of master of health informatics at Makerere University, Kampala.
Keywords
Citation
Nakitandwe, R.M. (2026), Enhancing outbreak surveillance through integration of natural language processing in Uganda’s electronic integrated disease surveillance and response system. (Unpublished masters dissertation), Makerere University, Kampala, Uganda.