Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Impact of Large Language Model Assistance on Radiologists’ Diagnostic Performance for Brain Tumors by Experience Level

2026·0 Zitationen·Journal of Clinical MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Background: Large language models (LLMs) may assist radiologists in interpreting brain tumor MRI. We compared the diagnostic accuracy of ChatGPT-4o and Claude 3.5 Sonnet with that of board-certified radiologists and trainees, and evaluated whether LLM assistance could enhance diagnostic performance. Methods: A total of 127 histologically confirmed brain tumor cases were included. Two LLMs analyzed representative MRI images together with structured radiologic reports, whereas two board-certified radiologists and three trainees reviewed representative images with basic demographic information only. All participants generated up to three differential diagnoses per case. The accuracy of the primary diagnosis and the accuracy of the top-three differential diagnoses were calculated and compared. Following the initial readings, LLM-generated differential diagnoses were provided to the readers, and their post-assistance diagnostic performance was re-evaluated. Results: Claude 3.5 Sonnet achieved a primary diagnostic accuracy of 50.4% and a top-three differential accuracy of 85.0%, comparable to ChatGPT-4o (44.9% and 82.7%, respectively). Radiologists demonstrated a higher primary diagnostic accuracy (69.3%, p < 0.001) compared to LLMs, but a similar top-three differential accuracy (80.7%). In contrast, trainees showed a primary diagnostic accuracy (48.0%) comparable to LLMs, but a lower top-three differential accuracy (62.5%) than LLMs. With LLM assistance, radiologists exhibited a significant improvement in the top-three differential accuracy (from 80.7% to 90.2%, p < 0.001), and trainees showed significant improvements in both the primary and top-three differential accuracy (from 48.0% to 58.8%, p < 0.001, and from 62.5% to 81.1%, p < 0.001, respectively). Conclusion: LLMs demonstrated the ability to expand differential diagnostic considerations when operating on structured imaging inputs. LLM assistance was associated with improved trainee performance in this constrained experimental setting. These findings should be interpreted cautiously and require validation under balanced input conditions and clinically realistic workflows.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingRadiology practices and education

Volltext beim Verlag öffnen

Impact of Large Language Model Assistance on Radiologists’ Diagnostic Performance for Brain Tumors by Experience Level

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen