Effect of substitution mutations on the temporal dynamics of HIV-1 subtype A1 and D in the Ugandan epidemic
Abstract
Background: Subtypes A1, D and inter-subtype recombinants dominate the Uganda HIV-1 epidemic. This diversity has affected disease diagnosis and therapeutics. Predictive modeling could guide better understanding of the current patterns and forecast the future HIV-1 epidemics so as to inform the development of timely interventions.Objectives: We determined the normalized temporal dynamics for Ugandan HIV subtypes A1, C, D and Recombinants and determined the frequency and distribution of substitution mutations within subtypes A1 and D in the HIV-1 gag gene, and developed a prediction model using time-series data from 1990 to 2019 to forecast mutation patterns.
Methods: We analyzed 971 gag and 278 Near full-length (NFL) sequences from the Los Alamos HIV sequence database (LANL). Phylogenetic analysis was done to assess the evolutionary relationship, substitution mutation frequencies were inferred using R Software and mutation hotspots and cold spots determined. ARIMA models were explored, to make projections of mutation patterns for the year 2025. To assess model performance, Akaike Information Criterion (AIC) and Mean Absolute Error (MAE) were utilized. Furthermore, we used the Ljung-Box test to test for residual autocorrelation. Results: A total of 218 (A1), 341 (D), 7 (C) and 405 (Recombinants) was obtained for the temporal analysis. There were no sequences for some years e.g.1996-2004. Prior to 1999, the pure subtypes A1 and D were dominant. However, A1/D recombinants increased exponentially after 1999, with a stabilized distribution of A1 and D at approximately 25% each for the rest of the study. An average mutation frequency of 3.6 % was observed in the gag sequences with 4.5 % (67 out of 1500) positions defined as hotspots at nucleotide level. There were more transitions (0.0317) than transversions (0.0159) with a p-value of 9.537×10−7 in the NFL sequences. The ARIMA (0,1,0) model outperformed other models (AIC -78.43) with a p-value (Ljung-Box test) of 0.3609.
Conclusion: These findings suggest a shift from dominance by pure subtypes A1 and D to a balanced coexistence with recombinant forms. Substitution mutations in the NFL genomes tend to be more transitions than transversions. We predict that there will be more C→G substitution mutations compared to the rest, followed by G→T and G→C. The T→G mutations will occur with the lowest frequency in the year 2025, thus the need to monitor these patterns.