Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Abstract 4366900: Phenotyping Cardiac Surgery Patients Using Retrieval-Augmented Large Language Models
0
Zitationen
8
Autoren
2025
Jahr
Abstract
Introduction: Large Language Models (LLMs) are powerful tools for text extraction, but their tendency to hallucinate limits their reliability in clinical domains. We present a novel application of retrieval-augmented generation (RAG) to reduce hallucinations. Our approach restricts context to short, high-similarity segments within cardiac imaging reports, enabling more focused, conservative inference. We applied RAG to extract echocardiographic features from intraoperative transesophageal echocardiography (TEE) reports in a mixed cardiac surgery population to identify distinct patient phenotypes. Hypothesis: We hypothesized that RAG would outperform direct LLM querying in extracting key echocardiographic features by reducing hallucinations. We aimed to group patients into clinically meaningful clusters by their echocardiographic features. Methods: We developed a RAG pipeline that restricts LLM input to the most semantically relevant portions of TEE reports (Figure 1). We validated this pipeline on 500 manually labeled reports, extracting pre- and post-intervention left ventricular ejection fraction (LVEF), tricuspid regurgitation (TR), and right ventricular systolic function (RVSF), as well as pre-intervention aortic stenosis (AS), aortic regurgitation (AR), and mitral regurgitation (MR). RAG performance was compared to direct querying on these validation reports. Next, the pipeline was scaled to 7106 TEE reports to extract the features and intervention types. Patients were clustered using k-means, and each cluster’s characteristics were analyzed. Results: RAG’s conservative behavior—favoring “not found” over potential fabrications—resulted in fewer hallucinations compared to direct LLM queries (Figure 2): RAG improved adjusted accuracy across all validation features (LVEF pre: +1.24%, LVEF post: +0.47%, TR pre: +3.64%, TR post: +4.67%, RVSF pre: +5.31%, RVSF post: +4.33%, AS pre: +11.44%, AR pre: +3.93%, MR pre: +1.94%). Clustering revealed five distinct phenotypes: (1) an aortic disease group, (2) a CABG-dominant low risk group, (3) an advanced heart failure group, (4) a mixed valve disease group, and (5) a tricuspid disease group (Table 1). Conclusions: Our RAG pipeline improves the reliability of LLM-based clinical data extraction from TEE reports, enabling large-scale phenotyping of heterogeneous cardiac surgery populations. This approach has potential applications for personalized risk stratification and targeted clinical decision support in cardiac surgery.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.286 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.651 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.177 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.575 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.404 Zit.