Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Machine Learning–Based Pilot Study for the Classification of Thalassemia Subtypes Using Routine Laboratory Parameters
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Objective: Thalassemia is a hereditary hemoglobinopathy and remains a significant public health problem, particularly in Mediterranean regions. Although genetic testing represents the gold standard for subtype classification, access to such testing is limited in many clinical settings. This pilot study aimed to explore the feasibility of using machine learning models based on routinely available clinical and laboratory parameters to support the differentiation of thalassemia subtypes in the absence of genetic testing.Methods: This retrospective cross-sectional study included 83 individuals (57 thalassemia major, 11 thalassemia intermedia, and 15 healthy controls). Demographic, clinical, and laboratory variables were analyzed using the R programming language. A supervised Random Forest algorithm was applied for multiclass classification. Model performance was assessed using accuracy, class-specific sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). To further evaluate the distinction between thalassemia major and intermedia, a simplified logistic regression model was constructed, and Firth logistic regression was applied to address the small sample size and class imbalance.Results: The Random Forest model demonstrated an overall test-set accuracy of 85.7%. Sensitivity was 80% for thalassemia major and 100% for both thalassemia intermedia and healthy controls. Variable importance analysis identified red cell distribution width (RDW), hematocrit, ferritin, and hemoglobin as the most influential predictors. In the simplified logistic regression model distinguishing thalassemia major from intermedia, RDW was the only variable reaching statistical significance (p = 0.0476). Model performance metrics, including high AUC values, should be interpreted cautiously given the limited sample size.Conclusion: The Random Forest model demonstrated an overall test-set accuracy of 85.7%. Sensitivity was 80% for thalassemia major and 100% for both thalassemia intermedia and healthy controls. Variable importance analysis identified red cell distribution width (RDW), hematocrit, ferritin, and hemoglobin as the most influential predictors. In the simplified logistic regression model distinguishing thalassemia major from intermedia, RDW was the only variable reaching statistical significance (p = 0.0476). Model performance metrics, including high AUC values, should be interpreted cautiously given the limited sample size.
Ähnliche Arbeiten
Automatic Recording Apparatus for Use in Chromatography of Amino Acids
1958 · 9.602 Zit.
Enzymatic Amplification of β-Globin Genomic Sequences and Restriction Site Analysis for Diagnosis of Sickle Cell Anemia
1985 · 8.996 Zit.
Estimation of total, protein-bound, and nonprotein sulfhydryl groups in tissue with Ellman's reagent
1968 · 7.952 Zit.
Hepcidin Regulates Cellular Iron Efflux by Binding to Ferroportin and Inducing Its Internalization
2004 · 4.722 Zit.
A novel MHC class I–like gene is mutated in patients with hereditary haemochromatosis
1996 · 3.705 Zit.