OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.03.2026, 17:45

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the Reliability and Safety of Large Language Model-Generated Transfer Notes: A Retrospective Validation Study

2026·0 Zitationen·Intelligent MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2026

Jahr

Abstract

Large language models (LLMs) show great promise in processing medical texts; The increasing specialization of clinical medicine has led to a greater demand for efficient and accurate referral. This study aims to evaluate the ability of the large language model DeepSeek-R1 to generate transfer notes for gastrointestinal surgery patients. Its performance is compared with that of clinician-provided transfer notes in terms of completeness, accuracy, and clinician preference. A retrospective clinical analysis was conducted on 204 referral patients who underwent gastrointestinal surgery at Qingdao University Affiliated Hospital between January 2022 and June 2025. The LLM was trained using a small-sample study of four cases, and 200 cases were used as the test set. LLM-generated transfer notes were based on a structured template comprising predefined units. A thorough completeness review of the LLM-generated transfer records was conducted by two trained clinicians ( κ = 0.719, p < 0.05). We quantitatively assessed the LLM's extraction performance by calculating recall, precision, and F1 scores within the LLM-generated transfer notes. McNemar's test was used to compare the completeness of LLM-generated and clinician-provided transfer notes. Five clinicians conducted blinded, paired preference evaluations. DeepSeek demonstrated excellent overall performance in information extraction among the 200 transfer notes generated, achieving high precision (99% [95% CI: 98%, 99%]), recall (97% [95% CI: 96%, 98%]), and an F1 score of 0.98 [95% CI: 0.97, 0.98]. LLM-generated transfer notes were comparable in completeness to clinician-provided notes. Within the “Current Diagnosis” unit, LLM-generated notes were significantly more complete than clinician-provided notes (90% vs. 81.5%; 180 vs. 163; P < 0.05). There were no statistically significant differences across the remaining five assessment units (all P > 0.05). In preference evaluation, clinicians were observed to demonstrate a pronounced preference for referral notes generated by LLMs (39% [78/200] vs. 13% [26/200], respectively; 48% [96/200] rated them as equivalent). DeepSeek can generate transfer notes that are accurate and of a quality similar to that of clinician-provided notes. Further evaluation in actual clinical settings is necessary.

Ähnliche Arbeiten