Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Comparison of Machine Learning Techniques for the Detection of Type-2 Diabetes Mellitus: Experiences from Bangladesh
37
Zitationen
9
Autoren
2023
Jahr
Abstract
Diabetes is a chronic disease caused by a persistently high blood sugar level, causing other chronic diseases, including cardiovascular, kidney, eye, and nerve damage. Prompt detection plays a vital role in reducing the risk and severity associated with diabetes, and identifying key risk factors can help individuals become more mindful of their lifestyles. In this study, we conducted a questionnaire-based survey utilizing standard diabetes risk variables to examine the prevalence of diabetes in Bangladesh. To enable prompt detection of diabetes, we compared different machine learning techniques and proposed an ensemble-based machine learning framework that incorporated algorithms such as decision tree, random forest, and extreme gradient boost algorithms. In order to address class imbalance within the dataset, we initially applied the synthetic minority oversampling technique (SMOTE) and random oversampling (ROS) techniques. We evaluated the performance of various classifiers, including decision tree (DT), logistic regression (LR), support vector machine (SVM), gradient boost (GB), extreme gradient boost (XGBoost), random forest (RF), and ensemble technique (ET), on our diabetes datasets. Our experimental results showed that the ET outperformed other classifiers; to further enhance its effectiveness, we fine-tuned and evaluated the hyperparameters of the ET. Using statistical and machine learning techniques, we also ranked features and identified that age, extreme thirst, and diabetes in the family are significant features that prove instrumental in the detection of diabetes patients. This method has great potential for clinicians to effectively identify individuals at risk of diabetes, facilitating timely intervention and care.
Ähnliche Arbeiten
Biostatistical Analysis
1996 · 35.450 Zit.
UCI Machine Learning Repository
2007 · 24.319 Zit.
An introduction to ROC analysis
2005 · 20.981 Zit.
Prediction of Coronary Heart Disease Using Risk Factor Categories
1998 · 9.605 Zit.
The use of the area under the ROC curve in the evaluation of machine learning algorithms
1997 · 7.188 Zit.