Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Beyond ideal data: evaluating AI model behaviour in real-world echocardiographic images
0
Zitationen
12
Autoren
2025
Jahr
Abstract
Abstract Introduction Integrating artificial intelligence (AI) into clinical practice requires validation of model reliability and performance to ensure compliance with standards. State-of-the-art studies predominantly focus on training AI models with high-quality or laboratory echocardiographic images, excluding low-quality ones [1, 2]. This raises concerns about generalizability when these models are applied to diverse medical images of varying quality encountered in real-world clinical practice. Purpose This study evaluates how image quality impacts AI model performance in echocardiographic view identification and examines whether training with real-world images of diverse quality improves performance. Methods A dataset of 4907 sequences from 407 patients (mean age 62.4 ± 13.6 year; 60.4% male) was classified by 22 experts (mean expertise 5.17 ± 3.13 years) into 13 echocardiographic views, grouped by projection: apical (2-chamber, 3-chamber, 4-chamber, 5-chamber, other); parasternal short axis (great vessels, papillary muscles/mitral valve, apex, other); parasternal long axis; subcostal; suprasternal; and other. The sequences were also classified into standard (STD), compromised (LOW) and severely compromised (UNACCEPT) quality. Sequences were annotated from 2 to 5 times, until a majority consensus among experts was reached. A ResNet50-v2 backbone was selected as the AI model, aligning with the state of the art [3]. The model was trained for view identification under two scenarios (Figure 1): using only an STD subset (Experiment A); and using the full dataset, including all quality categories (Experiment B). Performance, defined as view classification accuracy, was evaluated independently for each quality category (STD, LOW and UNACCEPT), and the combination of them (ALL). An 80:20 training-testing split was used. Results Validation performance was influenced by the quality category in both experiments, with a notable decrease in the severely compromised category (see Table 1). This underscores the importance of rigorous model validation using real-world data to assess limitations before clinical integration. Experiment B consistently outperformed Experiment A across all validation scenarios, including STD, despite Experiment A being trained specifically on this subset. This demonstrates that training with images of diverse quality enhances overall generalization, not just performance on compromised-quality sets. Conclusions This study quantifies the effect of image quality on the performance of AI-based echocardiography view identification and examines the influence of training with varied-quality images. Findings indicate that (1) incorporating compromised images enhances overall model generalization and robustness, and (2) realistic datasets are essential for the reliable integration of AI models into clinical practice.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.456 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.332 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.779 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.533 Zit.