OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 11.03.2026, 21:26

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation

2025·13 Zitationen·Brain SciencesOpen Access
Volltext beim Verlag öffnen

13

Zitationen

25

Autoren

2025

Jahr

Abstract

<b>Background/Objectives</b>: Artificial intelligence (AI), particularly large language models (LLMs), has demonstrated versatility in various applications but faces challenges in specialized domains like neurology. This study evaluates a specialized LLM's capability and trustworthiness in complex neurological diagnosis, comparing its performance to neurologists in simulated clinical settings. <b>Methods</b>: We deployed GPT-4 Turbo (OpenAI, San Francisco, CA, US) through Neura (Sciense, New York, NY, US), an AI infrastructure with a dual-database architecture integrating "long-term memory" and "short-term memory" components on a curated neurological corpus. Five representative clinical scenarios were presented to 13 neurologists and the AI system. Participants formulated differential diagnoses based on initial presentations, followed by definitive diagnoses after receiving conclusive clinical information. Two senior academic neurologists blindly evaluated all responses, while an independent investigator assessed the verifiability of AI-generated information. <b>Results</b>: AI achieved a significantly higher normalized score (86.17%) compared to neurologists (55.11%, <i>p</i> < 0.001). For differential diagnosis questions, AI scored 85% versus 46.15% for neurologists, and for final diagnosis, 88.24% versus 70.93%. AI obtained 15 maximum scores in its 20 evaluations and responded in under 30 s compared to neurologists' average of 9 min. All AI-provided references were classified as relevant with no hallucinatory content detected. <b>Conclusions</b>: A specialized LLM demonstrated superior diagnostic performance compared to practicing neurologists across complex clinical challenges. This indicates that appropriately harnessed LLMs with curated knowledge bases can achieve domain-specific relevance in complex clinical disciplines, suggesting potential for AI as a time-efficient asset in clinical practice.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareBiomedical Text Mining and Ontologies
Volltext beim Verlag öffnen