OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.03.2026, 04:54

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Abstract 16401: Optimizing ChatGPT to Detect VT Recurrence From Complex Medical Notes

2023·3 Zitationen·Circulation
Volltext beim Verlag öffnen

3

Zitationen

13

Autoren

2023

Jahr

Abstract

Introduction: Large language models (LLMs), such as ChatGPT, have remarkable ability to interpret natural language using text questions (prompts) applied to gigabytes of data in the world wide web. However, the performance of ChatGPT is less impressive when addressing nuanced questions from finite repositories of lengthy, unstructured clinical notes (Fig A). Hypothesis: The performance of ChatGPT to identify sustained ventricular tachycardia (VT) or fibrillation after ablation from free-text medical notes is improved by optimizing the question and adding in-context sample notes with correct responses (‘prompt engineering’). Methods: We curated a dataset of N = 125 patients with implantable defibrillators (32.0% female, LVEF 48.9±13.9%, 61.7±14.0 years), split into development (N = 75) and testing (N = 50) sets of 307 and 337 notes, with 256.8±95.1 and 289.8±103 words, respectively. Notes were deidentified. Gold standard labels for recurrent VT (Yes, No, Unknown) were provided by experts. We applied GPT-3.5 to the test set (N=337 notes), using 1 of 3 prompts (“Does the patient have sustained VT or VF after ablation” or 2 others), systematically adding 1-5 “training” examples, and repeating experiments 10 times (51,561 inquiries). Results: At baseline, GPT achieved an F1 score of 38.6%±19.4% (mean across 3 prompts; Fig B). Increasing the number of examples progressively improved mean accuracy and reduced variance. The optimal result was the illustrated prompt plus 5 in-context examples, with an F1 score of 84.6%±6.4% (p<0.05). Conclusions: ChatGPT can accurately identify VT recurrence from small numbers of complex medical notes with optimal prompt engineering. Future studies should define optimal context for different medical questions and domains. These findings pave the way for automated analysis of large medical repositories to broadly improve decision making.

Ähnliche Arbeiten