Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Impact of Large Language Model Assistance on Radiologists’ Diagnostic Performance for Brain Tumors by Experience Level
0
Zitationen
9
Autoren
2026
Jahr
Abstract
<b>Background</b>: Large language models (LLMs) may assist radiologists in interpreting brain tumor MRI. We compared the diagnostic accuracy of ChatGPT-4o and Claude 3.5 Sonnet with that of board-certified radiologists and trainees, and evaluated whether LLM assistance could enhance diagnostic performance. <b>Methods</b>: A total of 127 histologically confirmed brain tumor cases were included. Two LLMs analyzed representative MRI images together with structured radiologic reports, whereas two board-certified radiologists and three trainees reviewed representative images with basic demographic information only. All participants generated up to three differential diagnoses per case. The accuracy of the primary diagnosis and the accuracy of the top-three differential diagnoses were calculated and compared. Following the initial readings, LLM-generated differential diagnoses were provided to the readers, and their post-assistance diagnostic performance was re-evaluated. <b>Results</b>: Claude 3.5 Sonnet achieved a primary diagnostic accuracy of 50.4% and a top-three differential accuracy of 85.0%, comparable to ChatGPT-4o (44.9% and 82.7%, respectively). Radiologists demonstrated a higher primary diagnostic accuracy (69.3%, <i>p</i> < 0.001) compared to LLMs, but a similar top-three differential accuracy (80.7%). In contrast, trainees showed a primary diagnostic accuracy (48.0%) comparable to LLMs, but a lower top-three differential accuracy (62.5%) than LLMs. With LLM assistance, radiologists exhibited a significant improvement in the top-three differential accuracy (from 80.7% to 90.2%, <i>p</i> < 0.001), and trainees showed significant improvements in both the primary and top-three differential accuracy (from 48.0% to 58.8%, <i>p</i> < 0.001, and from 62.5% to 81.1%, <i>p</i> < 0.001, respectively). <b>Conclusion</b>: LLMs demonstrated the ability to expand differential diagnostic considerations when operating on structured imaging inputs. LLM assistance was associated with improved trainee performance in this constrained experimental setting. These findings should be interpreted cautiously and require validation under balanced input conditions and clinically realistic workflows.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.214 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.071 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.429 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.418 Zit.