Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Beyond Filtering: Leveraging Instance Hardness for Data-Centric Machine Learning in Healthcare
1
Zitationen
3
Autoren
2025
Jahr
Abstract
Machine learning models often struggle to generalize to real-world healthcare settings due to data imperfections such as noise, biases, and label inconsistencies. A promising approach to improving model robustness is instance hardness analysis, which quantifies the difficulty of classifying individual data points in a dataset. While previous studies have explored filtering hard instances to enhance predictive performance, our findings indicate that this strategy can lead to overfitting and reduced generalization on external datasets. This work systematically evaluates the effects of filtering hard instances across multiple healthcare-related datasets, revealing the nuanced relationship between data complexity and model performance. Beyond filtering, we propose alternative data-centric strategies, including expert-guided instance selection, explainability-driven insights, and fairness-aware approaches, to ensure that machine learning models remain both accurate and equitable. Our results highlight the need for a holistic view of data quality, moving beyond naive filtering approaches to leverage instance hardness as a tool for dataset refinement and model improvement.