Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Outliers and anomalies in training and testing datasets for AI-powered morphometry—evidence from CT scans of the spleen
1
Zitationen
9
Autoren
2025
Jahr
Abstract
Introduction: Creating training and testing datasets for machine learning algorithms to measure linear dimensions of organs is a tedious task. There are no universally accepted methods for evaluating outliers or anomalies in such datasets. This can cause errors in machine learning and compromise the quality of end products. The goal of this study is to identify optimal methods for detecting organ anomalies and outliers in medical datasets designed to train and test neural networks in morphometrics. Methods: = 197 patients. Using visual methods (1.5 interquartile range; heat map; boxplot; histogram; scatter plot), machine learning algorithms (Isolation forest; Density-Based Spatial Clustering of Applications with Noise; K-nearest neighbors algorithm; Local outlier factor; One-class support vector machines; EllipticEnvelope; Autoencoders), and mathematical statistics (z-score, Grubb's test; Rosner's test). Results: We identified measurement errors, input errors, abnormal size values and non-standard shapes of the organ (sickle-shaped, round, triangular, additional lobules). The most effective methods included visual techniques (including boxplots and histograms) and machine learning algorithms such is OSVM, KNN and autoencoders. A total of 32 outlier anomalies were found. Discussion: Curation of complex morphometric datasets must involve thorough mathematical and clinical analyses. Relying solely on mathematical statistics or machine learning methods appears inadequate.
Ähnliche Arbeiten
A survey on deep learning in medical image analysis
2017 · 13.877 Zit.
pROC: an open-source package for R and S+ to analyze and compare ROC curves
2011 · 13.746 Zit.
Dermatologist-level classification of skin cancer with deep neural networks
2017 · 13.436 Zit.
A survey on Image Data Augmentation for Deep Learning
2019 · 12.026 Zit.
QuPath: Open source software for digital pathology image analysis
2017 · 8.375 Zit.