Application of random forest regressor algorithm to predict PM2.5 concentration levels in Kampala
Abstract
As it happens in every society, it is every body's wish to live in a clean and fresh environment. However, this might not be achieved in every daylife but at least the level of pollution can be controlled. Air pollution is one of the leading global public health risks but its magnitude in many developing countries is not known. As is in many African cities, fine particulate matters
(PM2.5) is dangerously high in Kampala. This thesis uses data mining algorithms to build a predictive model for the following days PM2.5 concentration level. The prediction of concentrations of pollutants can be a powerful tool in order to take preventive measures such as the reduction of emissions and alerting the affected population. This thesis presents a forecasting model
to predict the daily average concentrationof PM2.5 for the next few days(i.e. 3 to 5days). The proposed model used in this thesis was Random Forests regression. Random Forests regressor was compared with 4 other regression models namely Extra Trees Regressor, Gaussian Process Regressor, XGBoost, and Elasticnet. The performance estimation is determined using the Root Mean Square Error (RMSE), the Mean Absolute Error (MAE) and R-squared (R2). The results demonstrated that the Random Forests regressor algorithm outperformed other models.
6 pollution monitoring stations in Kampala measuring PM2.5 were selected. We found that the mean concentration of PM2.5 pollution was 3 times higher than the World Health Organization (WHO) recommended level.