Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
MoCA-Dialog: A Benchmark for Fine-Grained Evaluation of Large Language Models in Clinical Cognitive Assessment Dialogues
0
Zitationen
6
Autoren
2025
Jahr
Abstract
Current benchmarks for evaluating large language models (LLMs) in medicine primarily focus on static questionanswering, neglecting the dynamic, interactive nature of many clinical workflows. This is particularly evident in cognitive assessments, where nuanced dialogue and multimodal interpretation are critical, yet no specialized benchmark exists to evaluate these capabilities. To address this gap, we introduce MoCADialog, the first large-scale, multimodal benchmark featuring 5000 high-fidelity simulated dialogues for the Montreal Cognitive Assessment (MoCA). The benchmark includes tasks of increasing complexity: accurate scoring, cognitive profile generation, and clinical error attribution, allowing for a fine-grained evaluation across seven cognitive domains. Our comprehensive evaluation of state-of-the-art multimodal LLMs reveals significant performance disparities; while models excel at simple recall tasks, they consistently fail in domains requiring executive function and abstract reasoning. A novel, clinically-driven error analysis further indicates that these failures stem not from knowledge deficits, but from fundamental difficulties in interpreting nuanced cues and applying domain-specific reasoning. MoCA-Dialog provides a crucial tool for assessing the clinical readiness of LLMs and highlights that future progress depends on enhancing their core reasoning and interpretive abilities, not just expanding their knowledge base. We release our demo, and prompt examples at https://mocadialogue.github.io.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.250 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.109 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.482 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.434 Zit.