Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Task-Based Sampling of Patient Data for Rigorous Machine Learning/AI Performance Assessment
0
Zitationen
9
Autoren
2026
Jahr
Abstract
To assess the performance of an AI algorithm, an independent dataset is needed that matches the intended clinical claim and intended population (e.g., patient characteristics) for which the algorithm is meant. Using all available data for performance assessment may not be practical or optimal; to reduce the risk of sampling bias, the user is expected to utilize training and test data that are representative of the intended population. This work outlines a computational method for task-based sampling of data from a large repository and demonstrates its use, utilizing demographic characteristics and disease states as examples of the clinical attributes to match to an intended population. To run our developed task-based sampling algorithm, the user defines the initial cohort from which to sample, a target distribution profile, and a maximum allowable deviation in any subcategory. The functionality and results of the developed workflow are described in the context of sampling the Medical Imaging and Data Resource Center (MIDRC) data commons for algorithm performance assessment. An initial cohort of over 4000 patients was selected from the MIDRC public data commons. The task-based sampling algorithm was used to select samples matched to an approximate CDC demographic distribution with maximum allowable deviations of 5% and 10%. Resulting final cohorts of 542 and 870 unique patients with average clinical attribute differences of 1.0% and 2.1% were sampled, respectively. This investigation demonstrates that the developed task-based sampling algorithm can generate matched samples from a large dataset for reducing sampling bias in algorithm training and performance assessment.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.400 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.261 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.695 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.506 Zit.