Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Scalable screening for emergency department missed opportunities for diagnosis using sequential eTriggers and large language models

2025·0 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Importance Missed opportunities for diagnosis (MODs), sometimes termed diagnostic errors, are a major cause of patient morbidity and mortality in the emergency department (ED). EDs have employed eTriggers, rule-based case collections likely to have a higher than average error rate (e.g. 72 hour returns with admission), but their utility is limited by low error yields. Large language models (LLMs) offer new opportunities to identify MODs and contribute to both individual- and systems-level quality improvement. Objective To determine whether sequential screening of ED cases with eTriggers and an LLM can more efficiently identify MODs compared to eTriggers alone. Design Retrospective observational cohort study of ED encounters collected between March 2015 and June 2025. Setting 10 EDs (2 academic, 8 community) in a single US health system. Participants Emergency physicians reviewed and adjudicated random samples of cases identified by 3 previously validated eTriggers (72-hour return with admission, 10-day return with ICU admission, and floor-to-ICU escalation within 24 hours) using the SaferDX instrument. An ED physician also evaluated a novel hybrid eTrigger combining an LLM adjudicator with a rules engine for 9-day return admissions with emergency care– sensitive conditions (ECSCs). Exposures LLM MOD adjudication of ED cases with Claude Sonnet 4 using an iteratively-developed, standardized prompt incorporating the SaferDx instrument. Main Outcome(s) and Measure(s) Positive predictive value (PPV), sensitivity, specificity, negative predictive value (NPV), and number needed to screen (NNS) for MODs. Reviewer time to adjudicate cases and quality improvement stakeholder assessments of LLM case summaries were also measured. Results Of the 357 encounters (mean [SD] age, 65.2 [17.8] years; 47.1% female) reviewed, adjudicated MOD PPV ranged from 11.0% to 18.6% across traditional eTriggers. For 72-hour return admissions, the LLM achieved sensitivity 85.7% (95% CI, 65.4%-95.0%), specificity 56.8% (95% CI, 49.3%-64.0%), PPV 19.8%, and NPV 97.0%. For 10-day ICU returns, sensitivity was 100% (95% CI, 56.6%-100%), specificity 43.5% (95% CI, 25.6%-63.2%), PPV 27.8%, and NPV 100%. For floor-to-ICU escalations, sensitivity was 55.6% (95% CI, 33.7%-75.4%), specificity 64.6% (95% CI, 53.6%-74.2%), PPV 26.3%, and NPV 86.4%. The hybrid ECSC eTrigger identified 110 MODs (53.1% of 207 encounters), with blinded review of a stratified sample estimating PPV 45% and NPV 100%. Expert reviewers required a median of 5 minutes per case; restricting review to LLM-positive charts reduced review time by up to 50% without missed errors for these triggers. In stakeholder review, LLM-generated case summaries were rated highly actionable for individual clinician feedback (mean, 4.1 of 5) but less so for systems-level interventions (mean, 1.4 of 5). Conclusions and Relevance In this multisite retrospective study, LLMs demonstrated high NPVs across multiple eTrigger criteria. Sequential use of LLM and human review improved efficiency and detection compared with traditional eTriggers, and narrative case summaries offered a novel method to identify opportunities for clinician-level feedback. These findings suggest that LLM-based approaches may provide scalable diagnostic quality oversight in the ED. Key Points Question Can sequential screening with eTriggers and a large-language-model (LLM) identify missed opportunities for diagnosis (MODs) in the emergency department, improving screening efficiency versus traditional eTriggers? Findings In a multicenter retrospective cohort (10 EDs; 317 reviewed encounters), LLM adjudication showed high sensitivity and NPV across three established eTriggers (e.g., 72-hour returns: sensitivity 85.7%, NPV 97.0; 10-day ICU returns: sensitivity 100%, NPV 100%). A sequential approach was validated on a novel eTrigger for 9-day returns for select emergency care sensitive conditions, achieving PPV 45% and NPV 100% in 40 blinded samples. Meaning LLM-augmented eTrigger screening offers scalable, efficient MOD detection to support diagnostic quality oversight in EDs.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareTopic Modeling

Volltext beim Verlag öffnen

Scalable screening for emergency department missed opportunities for diagnosis using sequential eTriggers and large language models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen