A predictive machine learning framework for diabetes
Authors : Danjuma Maza, Joshua Olufemi Ojo, Grace Olubumi Akinlade
Pages : 583-592
Doi:10.31127/tuje.1434305
View : 41 | Download : 45
Publication Date : 2024-07-28
Article Type : Research
Abstract :Diabetes, a non-communicable disease, is associated with a condition indicative of too much glucose in the bloodstream. In the year 2022, it was estimated that about 422 million were living with the disease globally. The impact of diabetes on the world economy was estimated at $ 1.31 trillion in the year 2015 and implicated in the death of 5 million adults between the ages of 20 and 79 years globally. If left untreated for an extended time, could result in a host of other health complications. The need for predictive models to supplement the diagnostic process and aid the early detection of diabetes is therefore important. The current study is an effort geared toward developing a machine learning framework for the prediction of diabetes, expected to aid medical practitioners in the early detection of the disease. The dataset used in this investigation was sourced from the Kaggle database. The dataset consists of 100,000 entries, with 8,500 diabetics and 91,500 non-diabetics, indicating an imbalanced dataset. The dataset was modified to achieve a more balanced dataset consisting of 8,500 entries each for the diabetic and non-diabetic classes. Gradient Boosting classifier (GBC), Adaptive Boosting classifier (ADA), and Light Gradient Boosting Machine (LGBM) were the best three performing classifiers after comparing fifteen classifiers. The proposed framework is a stack model consisting of GBC, ADA, and LGBM. The ADA classifier was utilized as the meta-model. This model achieved an average accuracy, area under the curve (AUC), recall, precision, and f1-score of 91.12 ± 0.75 %, 97.83 ± 0.29 %, 92.03 ± 1.55 %, 90.40 ± 1.01 %, and 91.12 ± 0.77 %, respectively. The selling point of the proposed framework is the high recall of 92.03 ± 1.55 %, indicating that the model is sensitive to both the diabetic and the non-diabetic classes.Keywords : Classification, Diabetes, Prediction, Accuracy, Recall