Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Clinically adaptable machine learning model to identify early appreciable features of diabetes
31
Zitationen
6
Autoren
2023
Jahr
Abstract
Objective Diabetes Mellitus is a serious disease where the body of affected patients are failed to produce enough insulin that causes an abnormality of blood sugar. This disease happens for a number of reasons including modern lifestyle, lethargic attitude, unhealthy food consumption, family history, age, overweight, etc. The aim of this study is to propose a machine learning based prediction model that detects diabetes at the beginning. Methods In this work, we collected 520 patients records from the University of California, Irvine (UCI) machine learning repository of Sylhet Diabetes Hospital, Sylhet. Then, a similar questionnaire of that hospital was followed and assembled 558 patients records from all over Bangladesh through this questionnaire. However, we accumulated patient records of these two datasets. In the next step, these datasets were cleaned and applied thirty five state-of-arts classifiers such as logistic regression (LR), K nearest neighbors (KNN), support vector classifier (SVC), Nave Byes (NB), decision tree (DT), random forest (RF), stochastic gradient descent (SGD), Perceptron, AdaBoost, XGBoost, passive aggressive classifier (PAC), ridge classifier (RC), Nu-support vector classifier (Nu-SVC), linear support vector classifier (LSVC), calibrated classifier CV (CCCV), nearest centroid (NC), Gaussian process classifier (GPC), multinomial NB (MNB), Complement NB, Bernoulli NB (BNB), Categorical NB, Bagging, extra tree(ET), gradiant boosting classifier (GBC), Hist gradiant boosting classifier (HGBC), One Vs Rest Classifier (OVsRC), multi-layer perceptron (MLP), label propagation (LP), label spreading (LS), stacking, ridge classifier CV (RCCV), logistic regression CV (LRCV), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and light gradient boosting machine (LGBM) to explore best stable predictive model. The performance of the classifiers has been measured using five metrics such as accuracy, precision, recall, f1-score, and area under the receiver operating characteristic. Finally, these outcomes were interpreted using Shapley additive explanations methods and identified relevant features for happening diabetes. Results In this work, different classifiers were shown their performance where ET outperformed any other classifiers with 97.11% accuracy for the Sylhet Diabetes Hospital dataset (SDHD) and MLP shows the best accuracy (96.42%) for the collected dataset. Subsequently, HGBC and LGBM provide the highest 94.90% accuracy for the combined datasets individually. Conclusion However, it is observed that LGBM, stacking, HGBC, RF, ET, bagging, and GBC represent more stable results for each dataset.
Ähnliche Arbeiten
Biostatistical Analysis
1996 · 35.450 Zit.
UCI Machine Learning Repository
2007 · 24.319 Zit.
An introduction to ROC analysis
2005 · 20.968 Zit.
Prediction of Coronary Heart Disease Using Risk Factor Categories
1998 · 9.604 Zit.
The use of the area under the ROC curve in the evaluation of machine learning algorithms
1997 · 7.186 Zit.