OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.04.2026, 22:25

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Opportunities and Challenges of Visual Large Language Models in Imaging Diagnostics: Lessons from Brain Metastasis Detection in Clinical MRI

2026·0 Zitationen·DiagnosticsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

9

Autoren

2026

Jahr

Abstract

<b>Background/Objectives</b>: To evaluate the diagnostic accuracy of two visual large language models (vLLMs), GPT-4o (OpenAI) and Claude Sonnet 3.5 (Anthropic), for detecting brain metastases in routine MRI using combined imaging and textual input. <b>Methods</b>: This retrospective study included 31 patients with and 46 without brain metastases with underlying melanoma (<i>n</i> = 24), lung cancer (<i>n</i> = 23), breast cancer (<i>n</i> = 17), or renal cell carcinoma (<i>n</i> = 13). In total, 100 MRI examinations (50 with, 50 without metastases) were provided to both vLLMs using a single representative slice per sequence, together with clinical history and the referring question. The generated free-text reports were evaluated for detection accuracy, overdiagnosis, correct sequence recognition, anatomical localization, lesion laterality, and lesion size estimation. <b>Results</b>: Both vLLMs showed perfect sensitivity (100% for both) but very low specificity (GPT-4o: 8%, Sonnet 3.5: 4%; <i>p</i> = 0.625), resulting in low diagnostic accuracy (GPT-4o: 54%, Sonnet 3.5: 52%; <i>p</i> = 0.625). Sequence identification was highly accurate in both models, with GPT-4o performing significantly better (100% vs. 93%; <i>p</i> < 0.05). Identification of the anatomical brain region (70% vs. 72%; <i>p</i> = 1.00) and lesion laterality (62% vs. 76%; <i>p</i> = 0.189) was comparable. Both models hallucinated additional lesions in 12% of cases. Lesion size measurements showed no significant differences between the models or in comparison with the radiologist. <b>Conclusions</b>: GPT-4o and Claude Sonnet 3.5 can generate radiological reports and detect brain metastases with excellent sensitivity, but their very low specificity, frequent hallucinations, and limited spatial reliability currently preclude clinical application. Future work should address how the balance between visual and textual input influences diagnostic behavior in vLLMs.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Brain Metastases and TreatmentRadiomics and Machine Learning in Medical ImagingArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen