Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the use of GPT-3.5-turbo to provide clinical recommendations in the Emergency Department

2023·6 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2023

Jahr

Abstract

Abstract The release of GPT-3.5-turbo (ChatGPT) and other large language models (LLMs) has the potential to transform healthcare. However, existing research evaluating LLM performance on real-world clinical notes is limited. Here, we conduct a highly-powered study to determine whether GPT-3.5-turbo can provide clinical recommendations for three tasks (admission status, radiological investigation(s) request status, and antibiotic prescription status) using clinical notes from the Emergency Department. We randomly select 10,000 Emergency Department visits to evaluate the accuracy of zero-shot, GPT-3.5-turbo-generated clinical recommendations across four different prompting strategies. We find that GPT-3.5-turbo performs poorly compared to a resident physician, with accuracy scores 24% lower on average. GPT-3.5-turbo tended to be overly cautious in its recommendations, with high sensitivity at the cost of specificity. Our findings demonstrate that, while early evaluations of the clinical use of LLMs are promising, LLM performance must be significantly improved before their deployment as decision support systems for clinical recommendations and other complex tasks.

Autoren

Institutionen

University of California, San Francisco(US)

Themen

Artificial Intelligence in Healthcare and EducationCOVID-19 diagnosis using AIMachine Learning in Healthcare

Volltext beim Verlag öffnen

Evaluating the use of GPT-3.5-turbo to provide clinical recommendations in the Emergency Department

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen