An approach to enhance the performance of the Xgboost Classifier
An approach to enhance the performance of the Xgboost Classifier
Date
2025
Authors
Muhwezi, Raymond Mugisha
Journal Title
Journal ISSN
Volume Title
Publisher
Makerere University
Abstract
XGBoost is a dominant machine learning model for prediction and classification tasks. The XGBoost algorithm is an ensemble that often outperforms other machine learning models due to its enhanced predictive performance, efficiency, and regularization technique that prevent overfitting and underfitting. However, its heavy reliance on hyperparameter tuning creates computational weaknesses due to the intensive resource requirements of traditional methods like grid and random search. Furthermore, the raw features used for classification tasks may contain complex, non-linear relationships, not explicitly captured by XGBoost’ s base leaners. This study proposed an improved alternative by combining k-means clustering with Bayesian-optimized XGBoost. To validate this approach, the study utilised the red wine dataset from the UCI data repository. We first derived objective quality clusters from physicochemical attributes (like acidity, sugar, alcohol content) using k-means. Thereafter, two hyperparameter tuning approaches were then compared: (1) traditional hyperparameters, (2) Bayesian optimization. This study demonstrates that combining k-means clustering with Bayesian-optimized XGBoost significantly improves model classification accuracy compared to the use of traditional hyperparameters. When evaluated, the cluster-based model with Bayesian optimization achieved a 97.9% accuracy, F1-score of 97.4% and recall of 98.05%. On the other hand, the baseline model achieved 93.1% accuracy, 96.18% F1-score and 97.2% recall. This study demonstrates that the integration of k-means clustering with Bayesian optimization significantly enhances the performance of the XGBoost classifier. Consequently, we recommend deploying this validated model in real-world applications, such as automated wine quality grading, as well as in other industrial domains that require scalable and accurate classification solutions.
Description
A dissertation submitted to the Directorate of Graduate Training in partial fulfillment of the requirements for the award of the Degree of Master of Statistics of Makerere University
Keywords
Citation
Muhwezi, R. M. (2025). An approach to enhance the performance of the Xgboost Classifier; Unpublished Masters dissertation, Makerere University, Kampala