Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Task-Based Sampling of Patient Data for Rigorous Machine Learning/AI Performance Assessment

2026·0 Zitationen·Journal of Imaging Informatics in MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

To assess the performance of an AI algorithm, an independent dataset is needed that matches the intended clinical claim and intended population (e.g., patient characteristics) for which the algorithm is meant. Using all available data for performance assessment may not be practical or optimal; to reduce the risk of sampling bias, the user is expected to utilize training and test data that are representative of the intended population. This work outlines a computational method for task-based sampling of data from a large repository and demonstrates its use, utilizing demographic characteristics and disease states as examples of the clinical attributes to match to an intended population. To run our developed task-based sampling algorithm, the user defines the initial cohort from which to sample, a target distribution profile, and a maximum allowable deviation in any subcategory. The functionality and results of the developed workflow are described in the context of sampling the Medical Imaging and Data Resource Center (MIDRC) data commons for algorithm performance assessment. An initial cohort of over 4000 patients was selected from the MIDRC public data commons. The task-based sampling algorithm was used to select samples matched to an approximate CDC demographic distribution with maximum allowable deviations of 5% and 10%. Resulting final cohorts of 542 and 870 unique patients with average clinical attribute differences of 1.0% and 2.1% were sampled, respectively. This investigation demonstrates that the developed task-based sampling algorithm can generate matched samples from a large dataset for reducing sampling bias in algorithm training and performance assessment.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareRadiomics and Machine Learning in Medical Imaging

Volltext beim Verlag öffnen

Task-Based Sampling of Patient Data for Rigorous Machine Learning/AI Performance Assessment

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen