Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Opportunities and Challenges of Visual Large Language Models in Imaging Diagnostics: Lessons from Brain Metastasis Detection in Clinical MRI
0
Zitationen
9
Autoren
2026
Jahr
Abstract
<b>Background/Objectives</b>: To evaluate the diagnostic accuracy of two visual large language models (vLLMs), GPT-4o (OpenAI) and Claude Sonnet 3.5 (Anthropic), for detecting brain metastases in routine MRI using combined imaging and textual input. <b>Methods</b>: This retrospective study included 31 patients with and 46 without brain metastases with underlying melanoma (<i>n</i> = 24), lung cancer (<i>n</i> = 23), breast cancer (<i>n</i> = 17), or renal cell carcinoma (<i>n</i> = 13). In total, 100 MRI examinations (50 with, 50 without metastases) were provided to both vLLMs using a single representative slice per sequence, together with clinical history and the referring question. The generated free-text reports were evaluated for detection accuracy, overdiagnosis, correct sequence recognition, anatomical localization, lesion laterality, and lesion size estimation. <b>Results</b>: Both vLLMs showed perfect sensitivity (100% for both) but very low specificity (GPT-4o: 8%, Sonnet 3.5: 4%; <i>p</i> = 0.625), resulting in low diagnostic accuracy (GPT-4o: 54%, Sonnet 3.5: 52%; <i>p</i> = 0.625). Sequence identification was highly accurate in both models, with GPT-4o performing significantly better (100% vs. 93%; <i>p</i> < 0.05). Identification of the anatomical brain region (70% vs. 72%; <i>p</i> = 1.00) and lesion laterality (62% vs. 76%; <i>p</i> = 0.189) was comparable. Both models hallucinated additional lesions in 12% of cases. Lesion size measurements showed no significant differences between the models or in comparison with the radiologist. <b>Conclusions</b>: GPT-4o and Claude Sonnet 3.5 can generate radiological reports and detect brain metastases with excellent sensitivity, but their very low specificity, frequent hallucinations, and limited spatial reliability currently preclude clinical application. Future work should address how the balance between visual and textual input influences diagnostic behavior in vLLMs.
Ähnliche Arbeiten
Radiotherapy plus Concomitant and Adjuvant Temozolomide for Glioblastoma
2005 · 21.332 Zit.
The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary
2016 · 15.770 Zit.
The 2021 WHO Classification of Tumors of the Central Nervous System: a summary
2021 · 11.236 Zit.
Effects of radiotherapy with concomitant and adjuvant temozolomide versus radiotherapy alone on survival in glioblastoma in a randomised phase III study: 5-year analysis of the EORTC-NCIC trial
2009 · 7.723 Zit.
Erratum to: The 2007 WHO classification of tumours of the central nervous system
2007 · 4.453 Zit.