Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Adherence to Canadian Radiology Guidelines for Incidental Hepatobiliary Findings Using RAG-Enabled LLMs
16
Zitationen
2
Autoren
2025
Jahr
Abstract
<b>Purpose:</b> Large language models (LLMs) have the potential to support clinical decision-making but often lack training on the latest clinical guidelines. Retrieval-augmented generation (RAG) may enhance guideline adherence by dynamically integrating external information. This study evaluates the performance of two LLMs, GPT-4o and o1-mini, with and without RAG, in adhering to Canadian radiology guidelines for incidental hepatobiliary findings. <b>Methods:</b> A customized RAG architecture was developed to integrate guideline-based recommendations into LLM prompts. Clinical cases were curated and used to prompt models with and without RAG. Primary analyses assessed the rate of guideline adherence with comparisons made between LLMs with and without RAG. Secondary analyses evaluated reading ease, grade level, and response times for generated outputs. <b>Results:</b> A total of 319 clinical cases were evaluated. Adherence rates were 81.7% for GPT-4o without RAG, 97.2% for GPT-4o with RAG, 79.3% for o1-mini without RAG, and 95.1% for o1-mini with RAG. Model performance differed significantly across groups, with RAG-enabled configurations outperforming their non-RAG counterparts (<i>P</i> < .05). RAG-enabled models demonstrated improved reading ease and lower grade level scores; however, all model outputs remained at advanced comprehension levels. Response times for RAG-enabled models increased slightly due to additional retrieval processing but remained clinically acceptable. <b>Conclusions:</b> RAG-enabled LLMs significantly improved adherence to Canadian radiology guidelines for incidental hepatobiliary findings without compromising readability or response times. This approach holds promise for advancing evidence-based care and warrants further validation across broader clinical settings.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.281 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.646 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.169 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.564 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.399 Zit.