Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Multimodal Performance of GPT-4 in Complex Ophthalmology Cases
9
Zitationen
12
Autoren
2025
Jahr
Abstract
<b>Objectives:</b> The integration of multimodal capabilities into GPT-4 represents a transformative leap for artificial intelligence in ophthalmology, yet its utility in scenarios requiring advanced reasoning remains underexplored. This study evaluates GPT-4's multimodal performance on open-ended diagnostic and next-step reasoning tasks in complex ophthalmology cases, comparing it against human expertise. <b>Methods</b>: GPT-4 was assessed across three study arms: (1) text-based case details with figure descriptions, (2) cases with text and accompanying ophthalmic figures, and (3) cases with figures only (no figure descriptions). We compared GPT-4's diagnostic and next-step accuracy across arms and benchmarked its performance against three board-certified ophthalmologists. <b>Results</b>: GPT-4 achieved 38.4% (95% CI [33.9%, 43.1%]) diagnostic accuracy and 57.8% (95% CI [52.8%, 62.2%]) next-step accuracy when prompted with figures without descriptions. Diagnostic accuracy declined significantly compared to text-only prompts (<i>p</i> = 0.007), though the next-step performance was similar (<i>p</i> = 0.140). Adding figure descriptions restored diagnostic accuracy (49.3%) to near parity with text-only prompts (<i>p</i> = 0.684). Using figures without descriptions, GPT-4's diagnostic accuracy was comparable to two ophthalmologists (<i>p</i> = 0.30, <i>p</i> = 0.41) but fell short of the highest-performing ophthalmologist (<i>p</i> = 0.0004). For next-step accuracy, GPT-4 was similar to one ophthalmologist (<i>p</i> = 0.22) but underperformed relative to the other two (<i>p</i> = 0.0015, <i>p</i> = 0.0017). <b>Conclusions</b>: GPT-4's diagnostic performance diminishes when relying solely on ophthalmic images without textual context, highlighting limitations in its current multimodal capabilities. Despite this, GPT-4 demonstrated comparable performance to at least one ophthalmologist on both diagnostic and next-step reasoning tasks, emphasizing its potential as an assistive tool. Future research should refine multimodal prompts and explore iterative or sequential prompting strategies to optimize AI-driven interpretation of complex ophthalmic datasets.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.214 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.071 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.429 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.418 Zit.