Dimension reduction and regularization approaches in response to multicollinearity problem of the prediction of student performance in the final national examination: case of North-Kivu/Goma-DRC

Date
2025-11
Authors
Paluku, Mwenge. Gloire
Journal Title
Journal ISSN
Volume Title
Publisher
Makerere University
Abstract
Multicollinearity among predictors in regression models can inflate variance estimates and undermine the stability and interpretability of parameter estimates. This study conducts a comparative analysis of four widely used approaches: Dimension reduction approaches, Principal Component Regression (PCR), and Partial Least Square Regression (PLSR) on one hand and regularization approaches Ridge and Lasso Regression on the other hand. Through Student’s performances dataset, we assessed these methods in terms of model fit and predictive performance, variable selection capability and interpretability. Model selection criteria were employed, as well as T-paired test to compare residuals (MSE) of competing models, Akaike Information Criterion (AIC), and R-Squared (R2) determined the relative effectiveness and prediction accuracy of each approach. Results report there is no significant difference between residuals of Lasso and Ridge regression (Paired t-test P-value =0.15), this means that Ridge and Lasso regression models have same predictive performance. Lastly the test revealed a significant difference between residuals of Partial Least Square and Principal Component Regression (Paired t-test P-value=0.00), this means that PLS regression model outperformed PC regression model in terms of predictive accuracy. Results indicated that Lasso Regression outperformed the other approaches in terms of model fit with the least AIC= -2229.66, followed by the Ridge Regression with AIC= -2225.38, the Partial Least Square Regression and the Principal Component Regression were also performed respectively with AIC of -1041.05 and -1013.925. The findings also revealed the Lasso regression with the highest R2 value (72.7%) indicating the highest proportion of variance explained in the dependent variable. Through all the approaches, the regression models report that Sex, School of the student and Cohort of study were the most significant covariates predictors of the final national score at the national examination. In Addition, three of the four Principal components (humanities social sciences, science and Mathematics, Creative and Technical skills) are significant Predictors of The final Score. In the Ridge and Lasso Regression approaches, French, Chemistry, Civism, English, Probability, Biology, Philosophy and Algebra as the most important predictors of the Final score. Keywords; Student performance, Final national examination, North-Kivu/Goma-DRC
Description
A dissertation submitted to the Directorate of Research and Graduate Training in partial fulfilment of the requirements for the award of a Degree of Master of Statistics of Makerere University
Keywords
Citation
Paluku, M. G. (2025). Dimension reduction and regularization approaches in response to multicollinearity problem of the prediction of student performance in the final national examination: case of North-Kivu/Goma-DRC. Unpublished master’s thesis, Makerere University