Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Promises and Pitfalls of Large Language Models use to interpret Healthcare Guidelines
0
Zitationen
11
Autoren
2025
Jahr
Abstract
Healthcare guidelines provide evidence-based recommendations for disease management, but their complexity can make interpretation challenging in patient-specific contexts. Large Language Models (LLMs) have been proposed to help query such guidelines, but they may produce inaccurate or incomplete responses when applied to clinical tasks. Retrieval-Augmented Generation (RAG) methods, which have been utilized to enhance LLM performance by incorporating relevant excerpts from healthcare guidelines, represent one strategy to improve the accuracy. In this study, we present a threefold contribution toward evaluating LLMs for healthcare guideline interpretation. First, in collaboration with board-certified physicians, we developed GuidelineQA, a clinically curated, in-house question-answer dataset covering three widely used guidelines on cardiovascular disease, diabetes, and colon cancer prevention. The dataset includes a variety of questions, including those commonly needed by physicians, a frequently asked patient question, and adversarial questions designed to challenge the LLMs. Second, we assessed the performance of two state-of-the-art models—GPT-4o (closed-source) and LLaMA-2-7B (open-source) under both standard (non-augmented) and Retrieval-Augmented Generation (RAG) settings. Third, we conducted a comprehensive evaluation using both quantitative metrics (e.g., BERTScore) and qualitative human assessments.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.