Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A comparative feature selection study: Predicting Alzheimer's disease using primary healthcare and social services data
1
Zitationen
7
Autoren
2025
Jahr
Abstract
This study investigates the use of different feature selection techniques to improve the performance of machine learning (ML) models for the early prediction of Alzheimer's disease (AD), using primary healthcare and social services data from a cohort of 26,828 residents aged 65 years and older in Kuopio, Finland. We compared pre-classifier feature selection approaches such as analysis of variance (ANOVA) and mutual information (MI) and post-classifier approaches such as SHapley Additive exPlanations (SHAP). We assessed six ML models, with feature selection improving performance over using all features; XGBoost achieved the highest AUC (0.755) and Logistic Regression the highest balanced accuracy (0.668) using 50 SHAP-selected features, 3–4 years before clinical confirmation of the disease. The most predictive features originated from primary healthcare, particularly ICPC and ICD-10 codes for dementia and mild cognitive impairment. The results underscore the importance of feature selection for improving both performance and interpretability in early AD prediction and also highlighting the need to tailor feature selection to the ML model and dataset characteristics. The contribution of our work lies in integrating primary healthcare data with social services data for AD prediction, not previously explored by prior studies. Moreover, while most studies relied on a single feature selection approach, we conduct a comparison of various approaches to identify most effective methods for capturing AD risk factors. Future work should address the limitations of this study, including parameter optimization, data imbalance, small AD sample sizes, single geographic cohort, and additional features such as imaging biomarkers to enhance prediction. • Evaluated pre- and post-classifier feature selection to enhance early Alzheimer’s disease prediction via machine learning. • Analyzed data from 26,828 adults (65+) in Kuopio, Finland, from primary healthcare and social services. • SHAP-based feature selection achieved best performance: XGBoost AUC 0.755, Logistic Regression balanced accuracy 0.668. • Top predictors are primary care diagnosis codes (ICPC and ICD-10) for dementia and mild cognitive impairment. • Tailored feature selection improves prediction and interpretability of models for early Alzheimer’s disease detection.
Ähnliche Arbeiten
Biostatistical Analysis
1996 · 35.449 Zit.
UCI Machine Learning Repository
2007 · 24.319 Zit.
An introduction to ROC analysis
2005 · 20.836 Zit.
The use of the area under the ROC curve in the evaluation of machine learning algorithms
1997 · 7.158 Zit.
A method of comparing the areas under receiver operating characteristic curves derived from the same cases.
1983 · 7.076 Zit.