Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The impact of feature combinations on machine learning models for in-hospital mortality prediction
1
Zitationen
4
Autoren
2025
Jahr
Abstract
The growing volume of healthcare data presents opportunities for machine learning to improve treatment, uncover new patterns in data and predict patient outcomes. Selecting appropriate features for a machine learning model is an important step in the process as the choice of relevant variables directly influences the model's performance and interpretability. Effective feature selection can enhance both the accuracy and generalisability of the model, especially given the complexity and heterogeneity of healthcare data. The XGBoost algorithm is trained on the eICU Collaborative Research Database to predict in-hospital mortality, with focus on investigating the impact of different feature sets. The analysis cohort comprised 73 210 patients. Different models are trained and tested using 20 000 distinct feature sets, each containing ten features, to assess how different features influence model performance. The models are trained using a train/test split of 80/20. Shapley additive explanations (SHAP) values are used to evaluate the importance of individual features. On average, the feature sets achieve an area under the receiver operating characteristic curve (AUROC) of 0.811, with the highest AUROC of 0.832 obtained from the feature set comprising [admission diagnosis, age, albumin, creatinine, heart rate, mean blood pressure, motor (from Glasgow Coma Scale), respiratory rate, temperature, unit admit source]. Despite variations in feature composition, models exhibit comparable performance in terms of both AUROC and the area under the precision-recall curve (AUPRC). Overall, age emerges as particularly influential, appearing most frequently in the feature sets associated with the highest AUROC scores. However, this trend is not observed for AUPRC. The results show that different models can achieve similar discrimination for different feature sets and that feature importance and ranking vary accordingly. This suggests that there may be multiple routes to good performance and that evaluating several feature combinations could be more informative than focusing on a single best set. Average feature importances may not reliably indicate a variable's overall utility or real-world importance and should be interpreted within the context of specific combinations. Prospective evaluation of promising sets and attention to robustness across combinations may help guide validation and eventual clinical use.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.210 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.586 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.102 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.468 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.383 Zit.