OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 13.03.2026, 03:42

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Multimodal Performance of GPT-4 in Complex Ophthalmology Cases

2025·9 Zitationen·Journal of Personalized MedicineOpen Access
Volltext beim Verlag öffnen

9

Zitationen

12

Autoren

2025

Jahr

Abstract

<b>Objectives:</b> The integration of multimodal capabilities into GPT-4 represents a transformative leap for artificial intelligence in ophthalmology, yet its utility in scenarios requiring advanced reasoning remains underexplored. This study evaluates GPT-4's multimodal performance on open-ended diagnostic and next-step reasoning tasks in complex ophthalmology cases, comparing it against human expertise. <b>Methods</b>: GPT-4 was assessed across three study arms: (1) text-based case details with figure descriptions, (2) cases with text and accompanying ophthalmic figures, and (3) cases with figures only (no figure descriptions). We compared GPT-4's diagnostic and next-step accuracy across arms and benchmarked its performance against three board-certified ophthalmologists. <b>Results</b>: GPT-4 achieved 38.4% (95% CI [33.9%, 43.1%]) diagnostic accuracy and 57.8% (95% CI [52.8%, 62.2%]) next-step accuracy when prompted with figures without descriptions. Diagnostic accuracy declined significantly compared to text-only prompts (<i>p</i> = 0.007), though the next-step performance was similar (<i>p</i> = 0.140). Adding figure descriptions restored diagnostic accuracy (49.3%) to near parity with text-only prompts (<i>p</i> = 0.684). Using figures without descriptions, GPT-4's diagnostic accuracy was comparable to two ophthalmologists (<i>p</i> = 0.30, <i>p</i> = 0.41) but fell short of the highest-performing ophthalmologist (<i>p</i> = 0.0004). For next-step accuracy, GPT-4 was similar to one ophthalmologist (<i>p</i> = 0.22) but underperformed relative to the other two (<i>p</i> = 0.0015, <i>p</i> = 0.0017). <b>Conclusions</b>: GPT-4's diagnostic performance diminishes when relying solely on ophthalmic images without textual context, highlighting limitations in its current multimodal capabilities. Despite this, GPT-4 demonstrated comparable performance to at least one ophthalmologist on both diagnostic and next-step reasoning tasks, emphasizing its potential as an assistive tool. Future research should refine multimodal prompts and explore iterative or sequential prompting strategies to optimize AI-driven interpretation of complex ophthalmic datasets.

Ähnliche Arbeiten