Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Beyond ideal data: evaluating AI model behaviour in real-world echocardiographic images

2025·0 Zitationen·European Heart Journal

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Introduction Integrating artificial intelligence (AI) into clinical practice requires validation of model reliability and performance to ensure compliance with standards. State-of-the-art studies predominantly focus on training AI models with high-quality or laboratory echocardiographic images, excluding low-quality ones [1, 2]. This raises concerns about generalizability when these models are applied to diverse medical images of varying quality encountered in real-world clinical practice. Purpose This study evaluates how image quality impacts AI model performance in echocardiographic view identification and examines whether training with real-world images of diverse quality improves performance. Methods A dataset of 4907 sequences from 407 patients (mean age 62.4 ± 13.6 year; 60.4% male) was classified by 22 experts (mean expertise 5.17 ± 3.13 years) into 13 echocardiographic views, grouped by projection: apical (2-chamber, 3-chamber, 4-chamber, 5-chamber, other); parasternal short axis (great vessels, papillary muscles/mitral valve, apex, other); parasternal long axis; subcostal; suprasternal; and other. The sequences were also classified into standard (STD), compromised (LOW) and severely compromised (UNACCEPT) quality. Sequences were annotated from 2 to 5 times, until a majority consensus among experts was reached. A ResNet50-v2 backbone was selected as the AI model, aligning with the state of the art [3]. The model was trained for view identification under two scenarios (Figure 1): using only an STD subset (Experiment A); and using the full dataset, including all quality categories (Experiment B). Performance, defined as view classification accuracy, was evaluated independently for each quality category (STD, LOW and UNACCEPT), and the combination of them (ALL). An 80:20 training-testing split was used. Results Validation performance was influenced by the quality category in both experiments, with a notable decrease in the severely compromised category (see Table 1). This underscores the importance of rigorous model validation using real-world data to assess limitations before clinical integration. Experiment B consistently outperformed Experiment A across all validation scenarios, including STD, despite Experiment A being trained specifically on this subset. This demonstrates that training with images of diverse quality enhances overall generalization, not just performance on compromised-quality sets. Conclusions This study quantifies the effect of image quality on the performance of AI-based echocardiography view identification and examines the influence of training with varied-quality images. Findings indicate that (1) incorporating compromised images enhances overall model generalization and robustness, and (2) realistic datasets are essential for the reliable integration of AI models into clinical practice.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCardiac Imaging and DiagnosticsExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Beyond ideal data: evaluating AI model behaviour in real-world echocardiographic images

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen