An application of Data Mining Classification Techniques to Electricity Fraud Detection; A Case of Umeme (U) Ltd
MetadataShow full item record
The main objective of this research was to develop a suitable classification model that will be able to identify and predict customers with fraudulent consumption. This research addresses the lack of effective methods for fraud detection in electricity consumption by focusing on four different data mining classification techniques; K-Nearest Neighbors, Logistics Regression, Gaussian Naïve Bayes Classifier and Linear Discriminant analysis. These techniques were applied to classify consumption data from and electricity Distribution Company into fraud and non-fraud consumption. The data used consisted of; customers‟ electricity consumptions for a period 45 months from January 2014 to Sept 2017 and the customer's electricity irregularities data (anomaly case) for the same period based on a random inspection undertaken by UMEME Ltd. The results gained in this thesis indicated that the linear Discriminant analysis classifier is more suited for anomaly detection using machine learning techniques, gaining an accuracy of 96%. The K-Nearest Neighbor classifier obtained an accuracy of 93%, Logistic Regression classifier obtained an accuracy of 79%, and finally the Gaussian Naïve Bayes classifier produced an accuracy of 27% and it was the model with the least performance. The weak performance of Naïve Bayes algorithm is because it tends to treat all the input dimensions as independent from each other which is not always true in a real world setting. In other words, if there is an existence of covariance between two or more input dimension, the Gaussian Naïve Bayes classifier does not model it. This research recommends that UMEME Ltd explores the possibility of using the Linear Discriminant Analysis classifier in detecting electricity fraud. Furthermore, the selected techniques work as a black box, without induced descriptive rules to show how the attributes indicate fraudulent behavior, so this research also recommends that rule based model could be explored. In addition, detection algorithms can be enhanced by introducing more real-world parameters and variables. Real world parameters and variables can be the economic situation of a person or illegal activity in the past.