A predictive model for the early detection of diabetes using machine learning : a case study of Kiruddu hospital
Abstract
Diabetes Mellitus is one of major chronic non-communicable diseases which is either caused when the pancreas does not produce enough insulin, or when the body cannot effectively use the insulin, it produces. Many complications occur if diabetes remains untreated and unidentified. These include blindness, renal failure, and feet disorders with the risk of amputation. The rise in machine learning approaches solves this critical problem through early detection of diabetes that can reduce patient's health risks. The motive of this study is to design a model and develop a web application that can easily predict the likelihood of someone having diabetes which can prognosticate the likelihood of diabetes in patients with maximum accuracy. Methods This was a retrospective study that involved retrieval and reviewing of secondary data. Data that was collected was a total of 1899 patients at Kiruddu national referral hospital. In this study, various Classification algorithms were studied and the main aim was to show the comparison of different classification algorithms using the weka tool to accurately predict one's likelihood of having diabetes. Therefore, various machine learning classification algorithms were used in this experiment to detect diabetes at an early stage. The performances of all the algorithms were evaluated on various measures like Precision, Accuracy, F-Measure, and Recall. Accuracy is measured over correctly and incorrectly classified instances. Results All the models were trained on 1899 instances. It was observed that the LMT model has the highest number of correctly classified instances of 1794 whereas the IBK has the lowest number of correctly classified instances of 987. The LMT had the highest accuracy of 94.47% and lowest root mean squared error of 0.21 whereas the IBK model had the lowest accuracy 51.97% and the highest root mean squared error of 0.51. The Multilayer Perceptron model had the second-highest accuracy of 88.63% with 0.31 root mean squared error. Conclusion The results showed that the LMT model had the highest accuracy compared to other models and hence the best performing model. It is for this reason that the LMT model was integrated into the web application to enable prediction of the likelihood of people acquiring diabetes at early stages.