OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 06.04.2026, 06:26

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Repeatability, Reproducibility, and Diagnostic Accuracy of a Commercial Large Language Model (ChatGPT) to Perform Disaster Triage Using the Simple Triage and Rapid Treatment (START) Protocol

2024·0 Zitationen·Disaster Medicine and Public Health PreparednessOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2024

Jahr

Abstract

Abstract Objective The release of ChatGPT in November 2022 drastically lowered the barrier to artificial intelligence with an intuitive web-based interface to a large language model. This study addressed the research problem: “Can ChatGPT adequately triage simulated disaster patients using the Simple Triage and Rapid Treatment (START) tool?” Methods Five trained disaster medicine physicians developed nine prompts. A Python script queried ChatGPT Version 4 with each prompt combined with 391 validated patient vignettes. Ten repetitions of each combination were performed: 35190 simulated triages. Results A valid START score was returned In 35102 queries (99.7%). There was considerable variability in the results. Repeatability (use of the same prompt repeatedly) was responsible for 14.0% of overall variation. Reproducibility (use of different prompts) was responsible for 4.1% of overall variation. Accuracy of ChatGPT for START was 61.4% with a 5.0% under-triage rate and a 33.6% over-triage rate. Accuracy varied by prompt between 45.8% and 68.6%. Conclusions This study suggests that the current ChatGPT large language model is not sufficient for triage of simulated patients using START due to poor repeatability and accuracy. Medical practitioners should be aware that while ChatGPT can be a valuable tool, it may lack consistency and may provide false information.

Ähnliche Arbeiten