Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Abstract 4366900: Phenotyping Cardiac Surgery Patients Using Retrieval-Augmented Large Language Models

2025·0 Zitationen·Circulation

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Introduction: Large Language Models (LLMs) are powerful tools for text extraction, but their tendency to hallucinate limits their reliability in clinical domains. We present a novel application of retrieval-augmented generation (RAG) to reduce hallucinations. Our approach restricts context to short, high-similarity segments within cardiac imaging reports, enabling more focused, conservative inference. We applied RAG to extract echocardiographic features from intraoperative transesophageal echocardiography (TEE) reports in a mixed cardiac surgery population to identify distinct patient phenotypes. Hypothesis: We hypothesized that RAG would outperform direct LLM querying in extracting key echocardiographic features by reducing hallucinations. We aimed to group patients into clinically meaningful clusters by their echocardiographic features. Methods: We developed a RAG pipeline that restricts LLM input to the most semantically relevant portions of TEE reports (Figure 1). We validated this pipeline on 500 manually labeled reports, extracting pre- and post-intervention left ventricular ejection fraction (LVEF), tricuspid regurgitation (TR), and right ventricular systolic function (RVSF), as well as pre-intervention aortic stenosis (AS), aortic regurgitation (AR), and mitral regurgitation (MR). RAG performance was compared to direct querying on these validation reports. Next, the pipeline was scaled to 7106 TEE reports to extract the features and intervention types. Patients were clustered using k-means, and each cluster’s characteristics were analyzed. Results: RAG’s conservative behavior—favoring “not found” over potential fabrications—resulted in fewer hallucinations compared to direct LLM queries (Figure 2): RAG improved adjusted accuracy across all validation features (LVEF pre: +1.24%, LVEF post: +0.47%, TR pre: +3.64%, TR post: +4.67%, RVSF pre: +5.31%, RVSF post: +4.33%, AS pre: +11.44%, AR pre: +3.93%, MR pre: +1.94%). Clustering revealed five distinct phenotypes: (1) an aortic disease group, (2) a CABG-dominant low risk group, (3) an advanced heart failure group, (4) a mixed valve disease group, and (5) a tricuspid disease group (Table 1). Conclusions: Our RAG pipeline improves the reliability of LLM-based clinical data extraction from TEE reports, enabling large-scale phenotyping of heterogeneous cardiac surgery populations. This approach has potential applications for personalized risk stratification and targeted clinical decision support in cardiac surgery.

Autoren

Institutionen

Themen

Machine Learning in HealthcareArtificial Intelligence in Healthcare and EducationCOVID-19 diagnosis using AI

Volltext beim Verlag öffnen

Abstract 4366900: Phenotyping Cardiac Surgery Patients Using Retrieval-Augmented Large Language Models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen