OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 22:37

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters

2026·0 Zitationen·Turkish Journal of Internal MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2026

Jahr

Abstract

Objective: Thalassemia is a hereditary hemoglobinopathy and remains a significant public health problem, particularly in Mediterranean regions. Although genetic testing represents the gold standard for subtype classification, access to such testing is limited in many clinical settings. This pilot study aimed to explore the feasibility of using machine learning models based on routinely available clinical and laboratory parameters to support the differentiation of thalassemia subtypes in the absence of genetic testing.Methods: This retrospective cross-sectional study included 83 individuals (57 thalassemia major, 11 thalassemia intermedia, and 15 healthy controls). Demographic, clinical, and laboratory variables were analyzed using the R programming language. A supervised Random Forest algorithm was applied for multiclass classification. Model performance was assessed using accuracy, class-specific sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). To further evaluate the distinction between thalassemia major and intermedia, a simplified logistic regression model was constructed, and Firth logistic regression was applied to address the small sample size and class imbalance.Results: The Random Forest model demonstrated an overall test-set accuracy of 85.7%. Sensitivity was 80% for thalassemia major and 100% for both thalassemia intermedia and healthy controls. Variable importance analysis identified red cell distribution width (RDW), hematocrit, ferritin, and hemoglobin as the most influential predictors. In the simplified logistic regression model distinguishing thalassemia major from intermedia, RDW was the only variable reaching statistical significance (p = 0.0476). Model performance metrics, including high AUC values, should be interpreted cautiously given the limited sample size.Conclusion: The Random Forest model demonstrated an overall test-set accuracy of 85.7%. Sensitivity was 80% for thalassemia major and 100% for both thalassemia intermedia and healthy controls. Variable importance analysis identified red cell distribution width (RDW), hematocrit, ferritin, and hemoglobin as the most influential predictors. In the simplified logistic regression model distinguishing thalassemia major from intermedia, RDW was the only variable reaching statistical significance (p = 0.0476). Model performance metrics, including high AUC values, should be interpreted cautiously given the limited sample size.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Hemoglobinopathies and Related DisordersArtificial Intelligence in Healthcare and EducationMyeloproliferative Neoplasms: Diagnosis and Treatment
Volltext beim Verlag öffnen