Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators.

2025·1 Zitationen·PubMedOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Although large language models (LLMs) have been assessed for general medical knowledge using licensing exams, their ability to support clinical decision-making, such as selecting medical calculators, remains uncertain. We assessed nine LLMs, including open-source, proprietary, and domain-specific models, with 1,009 multiple-choice question-answer pairs across 35 clinical calculators and compared LLMs to humans on a subset of questions. While the highest-performing LLM, OpenAI o1, provided an answer accuracy of 66.0% (CI: 56.7-75.3%) on the subset of 100 questions, two human annotators nominally outperformed LLMs with an average answer accuracy of 79.5% (CI: 73.5-85.0%). Ultimately, we evaluated medical trainees and LLMs in recommending medical calculators across clinical scenarios like risk stratification and diagnosis. With error analysis showing that the highest-performing LLMs continue to make mistakes in comprehension (49.3% of errors) and calculator knowledge (7.1% of errors), our findings highlight that LLMs are not superior to humans in calculator recommendation.

Autoren

Themen

Artificial Intelligence in Healthcare and EducationTopic Modeling

Volltext beim Verlag öffnen

Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators.

Abstract

Ähnliche Arbeiten

Autoren

Themen