Explainable ensemble machine learning for SQL injection attack detection

dc.contributor.author Sekyewa, Raymond
dc.date.accessioned 2025-12-04T13:28:46Z
dc.date.available 2025-12-04T13:28:46Z
dc.date.issued 2025
dc.description A dissertation submitted to the School of Graduate Studies in partial fulfillment for the award of Master of Science in Data Communication and Software Engineering Degree of Makerere University
dc.description.abstract SQL injection (SQLi) remains a major cybersecurity threat that exploits weaknesses in database-driven web applications to gain unauthorized access to sensitive data. Existing detection systems often rely on static rule sets and opaque machine learning models that lack interpretability, adaptability, and robustness against new attack variations. To address these limitations, this study developed an explainable hybrid ensemble machine learning model for SQL injection detection. The proposed framework integrates transformer-based semantic understanding with statistical query profiling to enhance both accuracy and interpretability. A dataset of 22,470 SQL queries collected from two production systems at Makerere University, namely the Makerere University E-Learning Environment (MUELE) and the Electronic Human Resource Management System (EHRMS) was used for model development and evaluation. The dataset included six major SQLi categories: tautology-based, union query, piggy-backed, comment-based, illegal/logically incorrect, and blind SQLi, allowing for comprehensive performance analysis across diverse attack types. Feature engineering played a central role in the model’s success. Contextual features were extracted using Bidirectional Encoder Representations from Transformers (BERT), capturing the semantic meaning of SQL syntax and revealing obfuscated injection patterns undetectable by traditional methods. These semantic embeddings were combined with handcrafted statistical indicators such as query length, special-character frequency, and keyword density, enabling detection of structural anomalies indicative of SQL injection behavior. This hybrid representation provided a multidimensional understanding of both syntactic and semantic query characteristics, improving model sensitivity and interpretability. Multiple classifiers including Decision Tree, Random Forest, Support Vector Machine, Logistic Regression, K-Nearest Neighbors, Naïve Bayes, Gradient Boosting, LightGBM, and CatBoost were trained and evaluated. Ensemble techniques such as bagging, boosting, and voting were applied to enhance generalization performance. Therefore, the proposed boosting-based ensemble model achieved an accuracy of 99.49%, with balanced F1-scores of 96.87% for benign queries and 99.72% for malicious queries. Explainability was incorporated through SHAP (SHapley Additive Explanations) and LIME (Local Interpretable Model-Agnostic Explanations). SHAP analysis revealed that BERT embeddings contributed approximately 45% of the model’s predictive power, while features such as tautological conditions and comment-based patterns were key indicators of SQLi attacks. The final model was deployed as a RESTful FastAPI microservice, capable of processing over 10 queries per second with average response times of 150–200 ms. The study demonstrates that combining semantic embeddings with statistical features in an explainable ensemble framework yields a robust, interpretable, and production-ready solution for SQL injection detection. Keywords: Machine learning, SQL Injection Attack Detection
dc.identifier.citation Sekyewa, R. (2025). Explainable ensemble machine learning for SQL injection attack detection; Unpublished dissertation, Makerere University, Kampala
dc.identifier.uri https://makir.mak.ac.ug/handle/10570/15494
dc.language.iso en
dc.publisher Makerere University
dc.title Explainable ensemble machine learning for SQL injection attack detection
dc.type Other
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Sekyewa-Masters-2025.pdf
Size:
3.77 MB
Format:
Adobe Portable Document Format
Description:
Masters dissertation
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
462 B
Format:
Item-specific license agreed upon to submission
Description: