OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 13.03.2026, 21:07

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Do artificial intelligence methods in echocardiography know when closer human scrutiny is needed?

2025·0 Zitationen·European Heart Journal - Cardiovascular Imaging
Volltext beim Verlag öffnen

0

Zitationen

15

Autoren

2025

Jahr

Abstract

Abstract Background Artificial intelligence (AI) measurement has the potential to transform cardiac imaging, but what if it makes a mistake? Can AI also highlight when it is most likely to have mismeasured and can experts improve these measurements? As readers become increasingly reliant on automated measures for echocardiographic analysis, we need confidence that the software will measure accurately, and also flag where human oversight is needed. Purpose To develop and test an open, scientific, machine-learning method of validly judging the level of certainty in an AI measurement, to enable human oversight to focus on the measurements most likely to be incorrect. Methods The heatmap altitude (MHA) at the measurement point is widely used as an automatic index of predicted reliability of an AI measurement. This is derived from the raw neural network output to give a level of confidence between 0 and 1 in each measurement that the AI makes. We test this not only against the AI’s difference from the expert consensus, but also, through the use of multiple experts per case, against individual experts difference from the consensus using the Unity UK Echocardiography AI collaboratives dataset of 200 parasternal long-axis images. Each image was labelled with key points for the aortic annulus, sinus and sinotubular junction, and proximal ascending aorta dimensions by 10 experts. The mean expert consensus measure was obtained, and then the median deviation of the AI and other experts calculated. Results The heatmap altitude was skewed with median 0.822, IQR 0.793 to 0.841, 10th to 90th percentile 0.734 to 0.852. Images were grouped by decile, from the 20 images with lowest confidence to the 20 with the highest (Figure 1). Both the AI and expert error (deviation from expert consensus) was greatest in the lowest confidence decile and progressively lower in each decile of greater confidence (p<0.001 for trend). In the better 9 deciles, the AI error was smaller than the human error (p<0.001); in the worst decile it was equivalent (p =NS). Conclusion The fully automated and open-source MHA value usefully quantifies the reliability of an AI measurement. This could be used to target human oversight where it is most needed, but care must be taken as experts may find the images similarly difficult.

Ähnliche Arbeiten