Developing a Natural Language Processing Model to Characterize Mobility Patterns of M. Tuberculosis Cases in Lubaga and Kawempe Divisions of Kampala
Developing a Natural Language Processing Model to Characterize Mobility Patterns of M. Tuberculosis Cases in Lubaga and Kawempe Divisions of Kampala
Date
2025
Authors
Amutuheire, Drake
Journal Title
Journal ISSN
Volume Title
Publisher
Makerere University
Abstract
Background: Tuberculosis (TB) remains a major public health threat in Uganda, particularly in urban settings such as Lubaga and Kawempe divisions of Kampala, where community transmission is high. Over 75 percent of new TB infections are estimated to occur outside the household, yet traditional surveillance approaches fail to adequately capture these community-level dynamics. In congested areas like Lubaga and Kawempe, human movement creates complex and dynamic interactions among markets, transport hubs, education centers, and health facilities. These movements are not only vital to the local economy but also serve as potential pathways for infectious disease spread, including the transfer of TB cases over long distances. Understanding how mobility contributes to TB transmission is therefore critical for designing effective, targeted interventions.
Objective: This study aimed to develop and validate a Natural Language Processing (NLP) model to characterize the mobility patterns of TB patients in Kampala using Call Detail Records (CDRs), and to identify high-risk transmission hotspots through network analysis.
Methods: A retrospective cohort study design was used to analyze four years of CDR metadata from 400 bacteriologically confirmed TB patients enrolled in the Mapping Tuberculosis Transmission Study (MATTS). Preprocessing steps included geocoding, timestamp normalization, and removal of routine stop-word movements. Semantic and spatiotemporal features were extracted using TF-IDF weighting, cosine similarity, and Doc2Vec embeddings. The DBSCAN algorithm was used for mobility clustering, and a directed weighted mobility network was constructed. Centrality metrics, including degree, betweenness, and closeness, were computed to identify key convergence zones.
Results: TB patients visited an average of 54 unique locations, exhibiting heterogeneous yet patterned mobility behavior. Spatial clustering revealed repeated convergence at urban markets, slums, health facilities, and transportation hubs, particularly around Kisekka Market, Bwaise, and Mulago Hospital. The mobility network showed a small number of highly central nodes linking large segments of patient trajectories. The model achieved a mean Jaccard similarity score of 0.941 for trajectory reconstruction and a silhouette score of 0.387, indicating moderate internal cluster consistency.
Conclusion: Applying NLP and network analysis to mobile phone data provides novel insights into TB mobility and transmission dynamics in urban African settings. The study identifies high-risk locations that may act as transmission amplifiers and recommends that TB control programs prioritize these zones for targeted surveillance and Active Case Finding (ACF) interventions.