Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative diagnostic capability of large language models in neurosurgery
0
Zitationen
10
Autoren
2026
Jahr
Abstract
OBJECTIVE: OpenAI, Google, and Microsoft have recently developed popular large language models (LLMs) with incredible clinical applications. LLMs specific to neurosurgery, such as AtlasGPT, have also been recently released. However, the comparative neurosurgical diagnostic capabilities of these models are not well studied. The aim of this study was to evaluate and compare the ability of LLMs to diagnose neurosurgical pathologies. METHODS: Clinical vignettes (n = 148) extracted from a common neurosurgery case-based review textbook were stratified by subspecialty. OpenAI's ChatGPT-3.5 and ChatGPT-4, Google's Gemini, Microsoft Copilot, and AtlasGPT were prompted to provide a diagnosis: "Provide a neurosurgical diagnosis given the following history…[vignette]." Imaging was inputted for capable LLMs, and all queries were run in May 2024. Diagnoses were compared with the textbook for accuracy and errors were categorized appropriately. RESULTS: ChatGPT-4 was the most accurate model (74% correct), followed by AtlasGPT (63% correct), ChatGPT-3.5 (53% correct), Microsoft Copilot (48% correct), and Gemini (36% correct). Chi-square comparisons demonstrated that ChatGPT-4 was more accurate in providing clinical diagnoses than its counterparts (p = 0.005). Across all vignettes and LLMs, most errors were due to an inability to attribute a key piece of information (generally imaging data) to the diagnostic process while otherwise using logical stepwise reasoning. CONCLUSIONS: ChatGPT-4 offered the most accurate diagnoses when given established clinical vignettes. Adding imaging processing capabilities and relevant data significantly increased the accuracy of LLM diagnoses. LLMs can offer accurate assessments of common neurosurgical conditions but necessitate detailed prompting from clinicians. Artificial intelligence has incredible clinical potential; however, practitioners must be cautious and think critically while using them for diagnostic purposes.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.611 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.504 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.025 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.835 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.