OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 07:57

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Predicting Thyroid Dysfunction Using Classical Machine Learning With Rigorous Statistical Evaluation

2026·0 Zitationen·IEEE AccessOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2026

Jahr

Abstract

Thyroid disorders are among the most prevalent endocrine conditions worldwide and exert far–reaching effects on metabolic, cardiovascular and neurological systems. Early diagnosis remains challenging because the clinical manifestations of hypo– and hyperthyroidism are non– specific, symptoms evolve slowly and laboratory thresholds vary across populations. Meanwhile, underdiagnosis and delayed treatment contribute to morbidity, healthcare costs and reduced quality of life. Machine learning (ML) offers an opportunity to integrate demographic, clinical and laboratory data to support clinicians in identifying dysfunction at an early stage. We reanalyse a publicly available thyroid disease cohort consisting of 377 euthyroid cases and 61 cases spanning clinical and subclinical hyper– and hypothyroidism. Despite the modest sample size and severe class imbalance (smallest class n = 7), our methodologically rigorous approach provides a transparent baseline for future work. After rigorous preprocessing—including missing–value imputation, one– hot encoding and standardisation—we evaluated six classical classifiers (logistic regression, decision tree, random forest, support vector machine, k-nearest neighbours and naïve Bayes) using stratified nested cross–validation (outer five–fold for performance estimation; inner cross–validation for hyperparameter tuning). Class imbalance is mitigated through inverse class weighting and synthetic minority oversampling (SMOTE/SMOTE–NC) applied only to training folds to prevent leakage. Hyperparameters are tuned via random search, and the best models are calibrated via Platt scaling and isotonic regression. Performance is summarised with mean ± standard deviation across folds for accuracy, macro–precision, macro–recall, macro–F1, micro– and macro–area under the receiver operating characteristic curve, Brier score and expected calibration error. The random forest achieves the highest macro–area under the ROC curve at 0.99±0.01 and balanced accuracy on this specific dataset, while logistic regression and support vector machines yield competitive performance. However, given the small sample size and severe class imbalance, these results should be validated on larger, independent cohorts before clinical deployment. Statistical comparisons using DeLong and McNemar tests confirm that the random forest significantly outperforms other models at the α = 0.05 level. Rather than proposing a new model architecture, this work contributes a transparent, leakage-aware and statistically grounded baseline for classical machine learning on this dataset. We conclude by discussing methodological choices, limitations and ethical considerations, and by situating our findings within the context of recent work.

Ähnliche Arbeiten