Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A comparative feature selection study: Predicting Alzheimer's disease using primary healthcare and social services data

2025·1 Zitationen·Informatics in Medicine UnlockedOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This study investigates the use of different feature selection techniques to improve the performance of machine learning (ML) models for the early prediction of Alzheimer's disease (AD), using primary healthcare and social services data from a cohort of 26,828 residents aged 65 years and older in Kuopio, Finland. We compared pre-classifier feature selection approaches such as analysis of variance (ANOVA) and mutual information (MI) and post-classifier approaches such as SHapley Additive exPlanations (SHAP). We assessed six ML models, with feature selection improving performance over using all features; XGBoost achieved the highest AUC (0.755) and Logistic Regression the highest balanced accuracy (0.668) using 50 SHAP-selected features, 3–4 years before clinical confirmation of the disease. The most predictive features originated from primary healthcare, particularly ICPC and ICD-10 codes for dementia and mild cognitive impairment. The results underscore the importance of feature selection for improving both performance and interpretability in early AD prediction and also highlighting the need to tailor feature selection to the ML model and dataset characteristics. The contribution of our work lies in integrating primary healthcare data with social services data for AD prediction, not previously explored by prior studies. Moreover, while most studies relied on a single feature selection approach, we conduct a comparison of various approaches to identify most effective methods for capturing AD risk factors. Future work should address the limitations of this study, including parameter optimization, data imbalance, small AD sample sizes, single geographic cohort, and additional features such as imaging biomarkers to enhance prediction. • Evaluated pre- and post-classifier feature selection to enhance early Alzheimer’s disease prediction via machine learning. • Analyzed data from 26,828 adults (65+) in Kuopio, Finland, from primary healthcare and social services. • SHAP-based feature selection achieved best performance: XGBoost AUC 0.755, Logistic Regression balanced accuracy 0.668. • Top predictors are primary care diagnosis codes (ICPC and ICD-10) for dementia and mild cognitive impairment. • Tailored feature selection improves prediction and interpretability of models for early Alzheimer’s disease detection.

Autoren

Institutionen

Themen

Artificial Intelligence in HealthcareMachine Learning in HealthcareArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

A comparative feature selection study: Predicting Alzheimer's disease using primary healthcare and social services data

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen