Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Large Language Models in the Image-Based Diagnosis of Intracranial Tumors
0
Zitationen
7
Autoren
2025
Jahr
Abstract
Background Artificial intelligence (AI) tools exist at the intersection of machine learning and natural language processing and are poised to rapidly transform healthcare. There has been a growing interest from clinicians in the ability of patient-accessible image analysis tools embedded into large language models (LLMs) to interpret raw clinical neuroimaging. Here, we compare the performance of GPT-4V and GPT-4o to that of neurosurgical attendings and trainees. Methodology A total of 20 brain MRI scans were included in this analysis, consisting of five gliomas, five meningiomas, five pituitary tumors, and five non-tumor control images. GPT-4V and GPT-4o were provided with identical prompts and MRI scans to determine the most likely diagnosis. Model performance in classifying each MRI scan into one of these four categories was compared to survey responses from neurosurgery attendings, fellows, senior residents, and junior residents. Results GPT-4V correctly diagnosed 40% of cases (n = 20), whereas GPT-4o achieved a 70% accuracy rate (n = 20). Neurosurgery attendings, fellows, and residents (n = 14) collectively identified the correct diagnoses in 84.6% of cases across the same 20 images (n = 280) based on a single cross-sectional MRI scan. Mean Cohen's kappa of surgeons compared to GPT-4V was 0.18, and compared to GPT-4o was 0.51. Conclusions While LLMs underperformed compared to surgeons in identifying central nervous system malignancies, GPT-4o demonstrated substantial improvement over GPT-4V, highlighting the rapid advancement of AI capabilities. Interrater reliability statistics showed further evidence that GPT-4o closely resembles human-level performance than GPT-4V. Further refinement of these models may bridge the performance gap and expand their utility in clinical neuroimaging. Extra caution should be given to patients in the use of such models at the individual patient level.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.393 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.259 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.688 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.502 Zit.