Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Developing and Evaluating Large Language Model–Generated Emergency Medicine Handoff Notes
57
Zitationen
9
Autoren
2024
Jahr
Abstract
Importance: An emergency medicine (EM) handoff note generated by a large language model (LLM) has the potential to reduce physician documentation burden without compromising the safety of EM-to-inpatient (IP) handoffs. Objective: To develop LLM-generated EM-to-IP handoff notes and evaluate their accuracy and safety compared with physician-written notes. Design, Setting, and Participants: This cohort study used EM patient medical records with acute hospital admissions that occurred in 2023 at NewYork-Presbyterian/Weill Cornell Medical Center. A customized clinical LLM pipeline was trained, tested, and evaluated to generate templated EM-to-IP handoff notes. Using both conventional automated methods (ie, recall-oriented understudy for gisting evaluation [ROUGE], bidirectional encoder representations from transformers score [BERTScore], and source chunking approach for large-scale inconsistency evaluation [SCALE]) and a novel patient safety-focused framework, LLM-generated handoff notes vs physician-written notes were compared. Data were analyzed from October 2023 to March 2024. Exposure: LLM-generated EM handoff notes. Main Outcomes and Measures: LLM-generated handoff notes were evaluated for (1) lexical similarity with respect to physician-written notes using ROUGE and BERTScore; (2) fidelity with respect to source notes using SCALE; and (3) readability, completeness, curation, correctness, usefulness, and implications for patient safety using a novel framework. Results: In this study of 1600 EM patient records (832 [52%] female and mean [SD] age of 59.9 [18.9] years), LLM-generated handoff notes, compared with physician-written ones, had higher ROUGE (0.322 vs 0.088), BERTScore (0.859 vs 0.796), and SCALE scores (0.691 vs 0.456), indicating the LLM-generated summaries exhibited greater similarity and more detail. As reviewed by 3 board-certified EM physicians, a subsample of 50 LLM-generated summaries had a mean (SD) usefulness score of 4.04 (0.86) out of 5 (compared with 4.36 [0.71] for physician-written) and mean (SD) patient safety scores of 4.06 (0.86) out of 5 (compared with 4.50 [0.56] for physician-written). None of the LLM-generated summaries were classified as a critical patient safety risk. Conclusions and Relevance: In this cohort study of 1600 EM patient medical records, LLM-generated EM-to-IP handoff notes were determined superior compared with physician-written summaries via conventional automated evaluation methods, but marginally inferior in usefulness and safety via a novel evaluation framework. This study suggests the importance of a physician-in-loop implementation design for this model and demonstrates an effective strategy to measure preimplementation patient safety of LLM models.
Ähnliche Arbeiten
Nurse-Staffing Levels and the Quality of Care in Hospitals
2002 · 2.257 Zit.
Role of Computerized Physician Order Entry Systems in Facilitating Medication Errors
2005 · 2.132 Zit.
Deficits in Communication and Information Transfer Between Hospital-Based and Primary Care Physicians
2007 · 2.079 Zit.
Burnout and Medical Errors Among American Surgeons
2010 · 2.072 Zit.
The Incidence and Severity of Adverse Events Affecting Patients after Discharge from the Hospital
2003 · 1.903 Zit.