Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Abstract 2747: Source discipline matters: Guideline anchored large language model outperforms Open Evidence for decision support in acute leukemias.
0
Zitationen
8
Autoren
2026
Jahr
Abstract
Abstract Background Acute leukemia is one of the most complex and rapidly evolving domains in hematologic oncology, where treatment selection depends on a variety of factors such as molecular subtype and performance status. The National Comprehensive Cancer Network (NCCN) provides updated, lineage-specific algorithms for Acute Myeloid Leukemia (AML) and Acute Lymphoblastic Leukemia (ALL), yet these guidelines are dense and frequently revised. Large language models (LLMs) may assist clinicians in synthesizing this data, but the reliability of their outputs depends critically on their evidence sources. This study compared an NCCN-anchored retrieval-augmented model (RAG GPT-5) with Open Evidence (OE), a model linked to journal-based sources such as NEJM and JAMA, to assess accuracy, safety, and guideline concordance in acute leukemia decision support. Methods Forty de-identified AML and ALL vignettes were independently evaluated by two models: Open Evidence (O1) and an NCCN-anchored retrieval-augmented GPT-5 model (O2). ). Reviewers were blinded to model identity and rated each response using a modified Generative Performance Score (mGPS = Guideline Concordance - Hallucination Penalty; range −1.0 to + 1.0). Statistical comparison used independent-samples t-tests. Results The RAG model (O2) demonstrated significantly higher overall performance (mean = 0.84, SD = 0.25) compared with Open Evidence (O1, mean = 0.70, SD = 0.32); t(≈78) = −2.17, p = 0.033. Qualitative review revealed key distinctions in clinical reasoning: • Open Evidence frequently hallucinated agents (e.g., ipilimumab), omitted prior therapy context, and failed to adjust for infection recovery or cardiac risk before chemotherapy. • RAG GPT-5 exclusively cited NCCN recommendations, with minor rounding errors (e.g., ATRA dose), and occasionally defaulted to conservative but still guideline-concordant dosing (e.g., daunorubicin). • Neither model fully addressed dual-tumor or BCR-ABL-positive scenarios, and both under-recognized recent updates such as menin inhibitors for MLL-rearranged AML, which are emerging but not yet NCCN-listed. Variance was smaller for the RAG system, indicating more consistent performance across cases. Conclusions In acute leukemias, evidence source materially alters LLM behavior and reliability. Guideline-anchored retrieval produced significantly more NCCN-concordant recommendations and fewer hallucinations than OE. While both systems occasionally missed nuanced treatment history or recent investigational agents, only OE introduced clinically unsafe suggestions. These findings support NCCN-anchored RAG as the safer and more consistent foundation for LLM-based decision support in acute leukemias, where precision and patient context are paramount. Future work should expand to relapse and transplant scenarios with prospective clinician validation. Citation Format: Peter Palumbo, Connor Yost, Emilio Del Toro, Demetrios Garbis, Peter Odutola, Yash Kumar, Arturo Loaiza, Matthew Sullivan. Source discipline matters: Guideline anchored large language model outperforms Open Evidence for decision support in acute leukemias [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 2747.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.402 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.507 Zit.