Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Machine Learning in Predicting Diabetes in the Early Stage
28
Zitationen
1
Autoren
2020
Jahr
Abstract
Diabetes is a common disease and its early symptoms are not very noticeable, so an efficient method of prediction will help patients make a self-diagnosis. However, the conventional method to identify diabetes is to make a blood glucose test by doctors and the medical resource is limited. Therefore, most patients cannot get the diagnosis immediately. Since the early symptoms of diabetes are not obvious and the relationship between symptoms and diabetes is complex, the self-diagnosis results based on patients' own experience are not accurate. The process of Machine Learning is to train a computational algorithm for prediction based on a big dataset. It is popular for its efficiency and accuracy. Also, it has the advantage of dealing with tons of data, so we can make diagnoses for plenty of patients in a short time and the result will be more accurate. In this study, we used six classical machine learning models, including logistic regression, support vector machine, decision tree, random forest, boosting and neural network, to make a prediction model for diabetes diagnosis. Our data was from UCI Machine Learning Repository, which was collected by direct questionnaires from the patients of the Sylhet Diabetes Hospital in Sylhet, Bangladesh and approved by a doctor. We conduct parameter tuning on each model to tradeoff between the accuracy and complexity. The testing error shows that random forest, boosting and neural network had better performances than logistic regression, support vector machine and decision tree. The accuracy of neural network of the test dataset achieves 96 percent, which is the best model among these models for predicting diabetes.
Ähnliche Arbeiten
Biostatistical Analysis
1996 · 35.449 Zit.
UCI Machine Learning Repository
2007 · 24.319 Zit.
An introduction to ROC analysis
2005 · 20.946 Zit.
Prediction of Coronary Heart Disease Using Risk Factor Categories
1998 · 9.604 Zit.
The use of the area under the ROC curve in the evaluation of machine learning algorithms
1997 · 7.183 Zit.