Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Importance of stratified sampling for use in the development of training and test sets: medical imaging AI applications
3
Zitationen
4
Autoren
2025
Jahr
Abstract
The purpose of our study was to understand the importance of stratified sampling across multiple dataset characteristics (attributes) to yield appropriate training and test sets in developing and evaluating AI. Sampling algorithms are widely used to split data into training and testing cases in AI model development. Datasets are often split into a training set and test set to balance disease classes. However, other patient characteristics such as demographic attributes (age, race, ethnicity, and sex) can also be used. Here, we measured the similarity of subsets stratified on demographic attributes and disease classes. To do this, we built on our previous work using the Jensen-Shannon distance (JSD). JSD is a measure of similarity between two distributions. Previously, we had measured the similarity across datasets in terms of separate demographic attributes and disease states. In this study, we used a multidimensional JSD score that incorporates multiple demographic attributes and disease state into a single score. We calculated JSD scores that allowed us to compare the similarity of the subsets produced by the stratified sampling algorithm for each attribute separately and for all attributes combined (i.e., multidimensional JSD). Thus, a secondary aim of our study was to validate this generalized stratified sampling algorithm used to sequester images in the Medical Imaging and Data Resource Center (MIDRC) database. The third aim of our study was to calculate an upper limit for the JSD score to calibrate our intuition on the performance of the stratified sampling algorithm as compared to random sampling. The multidimensional JSD was calculated using an aggregate method. This method lists all possible combinations of attributes (demographic and disease state), counts instances from each dataset, and compares their similarity using the JSD. Our results show that the multi-dimensional JSD scores from random sampling and stratified sampling ranged from 0.1843 to 0.2159 and 0.1468 to 0.1674, respectively. This indicates that the stratified sampling framework yields training and test sets with a high degree of similarity. These results indicate the requirement for stratified sampling when training and testing AI.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.349 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.219 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.631 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.480 Zit.