Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The Expertise Paradox: Who Benefits from LLM-Assisted Brain MRI Differential Diagnosis?
0
Zitationen
22
Autoren
2025
Jahr
Abstract
Purpose To evaluate how reader experience influences the diagnostic benefit from LLM assistance in brain MRI differential diagnosis. Materials and Methods Neuroradiologists (n = 4), radiology residents (n = 4), and neurology/neurosurgery residents (n = 4) were recruited. A dataset of complex brain MRI cases was curated from the local imaging database (n = 40). For each case, readers provided a textual description of the main imaging finding and their top three differential diagnoses ("Unassisted"). Three state-of-the-art large language models (GPT-4.1, Gemini 2.5 Pro, DeepSeek-R1) were prompted to generate top-three differentials based on the clinical case description and reader-specific findings. Readers then revised their differential diagnoses after reviewing GPT-4.1 suggestions ("Assisted"). To evaluate the association between reader experience and diagnostic benefit, a cumulative link mixed model (CLMM) was fitted, with change in diagnostic result as ordinal outcome, reader experience as predictor, and random intercepts for rater and case. Results LLM-generated differential diagnoses achieved the highest top-3 accuracy when provided with image descriptions from neuroradiologists (top-3: 78.8-83.8%), followed by radiology residents (top-3: 71.8-77.6%), and neurology/neurosurgery residents (top-3: 62.6-64.5%). In contrast, mean relative gains in top-3 accuracy through LLM assistance diminished with increasing experience, with +19.2% for neurology/neurosurgery residents (from 43.2% to 62.6%), +14.7% for radiology residents (from 59.6% to 74.4%), and +4.4% for neuroradiologists (from 83.1% to 87.5%). The CLMM demonstrated a significant negative association between reader experience and diagnostic benefit from LLM assistance (β = −0.10, p = 0.005). Conclusion With increasing reader experience, absolute diagnostic LLM performance with reader-generated input improved, while relative diagnostic gains through LLM assistance paradoxically diminished. Our findings call attention to the divergence between standalone LLM performance and clinically relevant reader benefit, and emphasize the need to account for human-AI interaction in this context.
Ähnliche Arbeiten
Refinement and reassessment of the SERVQUAL scale.
1991 · 3.966 Zit.
Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review
2005 · 3.758 Zit.
Radiobiology for the Radiologist.
1974 · 3.501 Zit.
International evidence-based recommendations for point-of-care lung ultrasound
2012 · 2.808 Zit.
Radiation Dose Associated With Common Computed Tomography Examinations and the Associated Lifetime Attributable Risk of Cancer
2009 · 2.428 Zit.
Autoren
- Severin Schramm
- Bastien Le Guellec
- Marlene Topka
- M. Švec
- Paul Backhaus
- Viktor Maria Eisenkolb
- Evamaria Olga Riedel
- Mirjam Beyrle
- Paul-Sören Platzek
- Constanze Ramschütz
- Karolin J. Paprottka
- Martin Renz
- Jannis Bodden
- Jan S. Kirschke
- Sebastian Ziegelmeyer
- Felix Busch
- Marcus R. Makowski
- Lisa C. Adams
- Keno K. Bressem
- Dennis M. Hedderich
- Benedikt Wiestler
- Su Hwan Kim