Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

MoCA-Dialog: A Benchmark for Fine-Grained Evaluation of Large Language Models in Clinical Cognitive Assessment Dialogues

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Current benchmarks for evaluating large language models (LLMs) in medicine primarily focus on static questionanswering, neglecting the dynamic, interactive nature of many clinical workflows. This is particularly evident in cognitive assessments, where nuanced dialogue and multimodal interpretation are critical, yet no specialized benchmark exists to evaluate these capabilities. To address this gap, we introduce MoCADialog, the first large-scale, multimodal benchmark featuring 5000 high-fidelity simulated dialogues for the Montreal Cognitive Assessment (MoCA). The benchmark includes tasks of increasing complexity: accurate scoring, cognitive profile generation, and clinical error attribution, allowing for a fine-grained evaluation across seven cognitive domains. Our comprehensive evaluation of state-of-the-art multimodal LLMs reveals significant performance disparities; while models excel at simple recall tasks, they consistently fail in domains requiring executive function and abstract reasoning. A novel, clinically-driven error analysis further indicates that these failures stem not from knowledge deficits, but from fundamental difficulties in interpreting nuanced cues and applying domain-specific reasoning. MoCA-Dialog provides a crucial tool for assessing the clinical readiness of LLMs and highlights that future progress depends on enhancing their core reasoning and interpretive abilities, not just expanding their knowledge base. We release our demo, and prompt examples at https://mocadialogue.github.io.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareTopic Modeling

Volltext beim Verlag öffnen

MoCA-Dialog: A Benchmark for Fine-Grained Evaluation of Large Language Models in Clinical Cognitive Assessment Dialogues

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen