Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
An enhanced explainable thyroid disease diagnosis by leveraging cluster-smote and machine learning models
0
Zitationen
4
Autoren
2026
Jahr
Abstract
<title>Abstract</title> Thyroid disorders represent a major public health concern worldwide, affecting metabolic regulation and increasing the risk of cardiovascular and systemic complications when not detected early. Existing machine learning (ML) approaches for thyroid disease prediction are often limited by severe class imbalance, suboptimal calibration, and a lack of model interpretability. This study integrates Cluster-based Synthetic Minority Oversampling Technique (Cluster-SMOTE) to preserve minority class structure, alongside multiple machine learning models. The Random Forest classifier emerged as the best-performing model based on the F1-score criterion. Model reliability was further assessed using calibration analysis, Brier score evaluation, and Decision Curve Analysis (DCA). SHapley Additive exPlanations (SHAP) were employed to provide both global and local explanations of model predictions. Experimental evaluation on a publicly available thyroid disease dataset demonstrated that the proposed Random Forest–based framework achieved an F1-score of 0.99, accuracy of 0.99, precision of 0.99, recall of 0.99, AUC of 0.99, and a Brier score of 0.003. DCA further confirmed that the proposed model yields higher net clinical benefit across a wide range of threshold probabilities. These findings demonstrate that combining Cluster-SMOTE, a robust Random Forest classifier, and XAI validation produces an accurate, well-calibrated, and clinically interpretable thyroid disease prediction framework.
Ähnliche Arbeiten
Biostatistical Analysis
1996 · 35.445 Zit.
UCI Machine Learning Repository
2007 · 24.290 Zit.
An introduction to ROC analysis
2005 · 20.586 Zit.
The use of the area under the ROC curve in the evaluation of machine learning algorithms
1997 · 7.096 Zit.
A method of comparing the areas under receiver operating characteristic curves derived from the same cases.
1983 · 7.061 Zit.