Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Are AI-generated electrocardiograms clinically accurate? Benchmarking accuracy of AI-generated ECGs: a multiplatform performance study of public LLMs

2026·0 Zitationen·European Heart Journal - Digital HealthOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Background The use of generative AI to simulate electrocardiograms (ECGs) is expanding in medical education and digital cardiology. However, the diagnostic accuracy of ECGs produced by publicly accessible AI platforms has not been systematically evaluated. This study assessed whether synthetic ECGs generated by general-purpose AI services can accurately represent pre-specified arrhythmias. Purpose To evaluate the diagnostic accuracy and interpretability of ECGs generated by three widely available public AI services when prompted to simulate specific cardiac rhythms. Methods Bard, Bing Image Creator, and DALL-E were each prompted to generate ECG strips for ten common cardiac rhythms: sinus rhythm, sinus tachycardia, sinus bradycardia, atrial fibrillation, atrial flutter, ventricular tachycardia, ventricular fibrillation, complete heart block, supraventricular tachycardia, and asystole. Each platform produced four ECGs per rhythm (n=120). After excluding duplicates and non-ECG outputs (n=7), 113 ECGs remained. Three blinded physicians, including a cardiologist, independently reviewed each image and attempted to diagnose the rhythm. Discrepancies were resolved via adjudication. Accuracy was defined as agreement between the prompted rhythm and final expert consensus. Results were stratified by platform and rhythm. Results Only 37 of 113 ECGs (32.7%) accurately matched the intended rhythm. Additionally, 25.1% were uninterpretable due to graphical artefacts, physiologically implausible tracings, or distorted morphology. Bard produced the highest proportion of correct ECGs (84.5%) but primarily retrieved existing online images. Bing and DALL-E achieved rhythm-matched outputs in only 12.5% and 10% of cases, respectively. Atrial flutter (58.3%) and atrial fibrillation (50%) were the most accurately generated rhythms. Conclusion Synthetic ECGs generated by public AI tools demonstrate poor and inconsistent diagnostic accuracy. While Bard produced more rhythm-matched images, these were often retrieved rather than generated. These findings highlight the current limitations of publicly available generative AI for ECG simulation and support the need for domain-specific models before integration into clinical education.

Autoren

Institutionen

Themen

ECG Monitoring and AnalysisArtificial Intelligence in Healthcare and EducationCardiac electrophysiology and arrhythmias

Volltext beim Verlag öffnen

Are AI-generated electrocardiograms clinically accurate? Benchmarking accuracy of AI-generated ECGs: a multiplatform performance study of public LLMs

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen