Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark
13
Zitationen
19
Autoren
2024
Jahr
Abstract
Abstract The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the closeended question-answering (QA) task with answer options for evaluation. However, many clinical decisions involve answering open-ended questions without pre-set options. To better understand LLMs in the clinic, we construct a benchmark ClinicBench . We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. Furthermore, we construct six novel datasets and clinical tasks that are complex but common in real-world practice, e.g., open-ended decision-making, long document processing, and emerging drug analysis. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings. Finally, we invite medical experts to evaluate the clinical usefulness of LLMs 1 .
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.179 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.561 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.071 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.429 Zit.
Analysis of Survival Data.
1985 · 4.379 Zit.