Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating AI Chatbots in Neurological Function Test Interpretation for Brain Tumor Surgery
0
Zitationen
5
Autoren
2025
Jahr
Abstract
<title>Abstract</title> Background Neuropsychological assessments are essential for evaluating functional status and guiding surgical planning in patients with brain tumors. However, their complexity may hinder interpretation for patients and junior clinicians. Large language model (LLM)-based chatbots have emerged as tools providing medical information, but their ability to interpret real-world neuropsychological test results remains unevaluated. This study investigates whether LLMs can provide accurate, patient-friendly explanations of neuropsychological tests and support communication in neurosurgical care. Methods We included 20 patients who underwent at least one of five neuropsychological tests—Seoul Neuropsychological Screening Battery, Albert Test, Line Bisection Test, Boston Naming Test, or Western Aphasia Battery—prior to brain tumor surgery. Three LLMs (ChatGPT, Copilot, and Perplexity) were prompted with standardized queries on test explanations and tumor localization. Responses were evaluated for readability using Flesch-Kincaid Grade Level, understandability using a modified Patient Education Materials Assessment Tool, and explanatory accuracy using an expert-rated 4-point scale. Tumor localization accuracy was assessed using binary scale, and a patient survey assessed the top-performing model’s perceived usefulness. Results Readability scores ranged from 9.6 (Copilot) to 11.0 (Perplexity). Understandability scores were highest for ChatGPT (83.2%), followed by Perplexity (81.3%) and Copilot (66.4%). ChatGPT performed best in test purpose and methodology; Perplexity scored highest in result interpretation and overall accuracy. Tumor localization accuracy was limited across all models (≤ 45%). Patient rated Perplexity highly in understandability (4.0/4.0) and usefulness (3.8/4.0). Conclusions LLMs generated accurate, understandable explanations of neuropsychological tests. These tools may support multidisciplinary care and patient communication in brain tumor surgery.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.102 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.468 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.