OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 14.03.2026, 13:38

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Abstract PS3-06-29: Retrieval-augmented GPT-4 improves NCCN-concordant breast cancer treatment recommendations

2026·0 Zitationen·Clinical Cancer Research
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2026

Jahr

Abstract

Abstract Guideline-concordant breast cancer treatment selection is increasingly complex due to rapid evolution in biomarker-driven therapies. Large language models (LLMs), such as GPT-4, offer a potential tool to support clinical decision-making. However, hallucinations and deviation from evidence-based guidelines remain significant concerns. This study evaluated whether retrieval-augmented generation (RAG) using National Comprehensive Cancer Network (NCCN) guidelines improves GPT-4 performance in treatment recommendation tasks. We developed a retrieval-augmented GPT-4 agent (RAG-GPT) by indexing 2025 NCCN Breast Cancer Guidelines into a vector database using sentence embeddings. Forty clinical vignettes were created from published case reports to reflect diverse scenarios, including HER2-low, triple-negative, and BRCA-mutated disease across all stages. Two models—a baseline GPT-4 (without augmentation) and the RAG-GPT—were prompted with structured templates requesting treatment plans, rationale, clinical trial options, and citations.Outputs were independently scored by two blinded oncology-trained reviewers using the modified Generative AI Performance Score (mG-PS). The mG-PS scored outputs for 2 categories, guideline concordance (Gold standard = 1.0, Acceptable = 0.5, Non-concordant = 0.0) and hallucination errors (Severe = -1.0, Moderate = -0.5, Mild = -0.25, None = 0.0). Guideline Concordance was deemed based on NCCN guidelines and what an oncologist would due based on using the guidelines for each case. Hallucinations are a term used for falsehoods in Artificial Intelligence, like made-up facts or studies, incorrect information, improper doses, or inappropriate or wrong treatments. To ensure consistency and minimize inter-rater variability, reviewers were provided with comprehensive scoring guidelines and detailed instructions for each evaluation metric. Scores were normalized between -1.0 and 1.0 for improved evaluation of tools. Readability and rationale clarity were rated on a 5-point Likert scale. Inter-rater reliability was assessed using Cohen’s kappa. The RAG-GPT demonstrated a significantly higher mean mG-PS score (0.599 ± 0.102, 95% CI: 0.498-0.702) compared to the baseline GPT-4, which not only scored lower but had a negative mean (-0.21 ± 0.143, 95% CI: -0.353 to -0.067). For the mG-PS Cohen’s Kappa test resulted in a score of .69 with substantial aggreament. An independent sample t-test showed a difference (t(78) = 29.1, p < 0.0001). Severe hallucinations were also absent in RAG-GPT outputs (0/40) but occurred in 8/40 baseline GPT-4 outputs. The evaluation of readability and rationality also showed notable differences, with the augmented LLM scoring significantly higher (M = 4.35, SD = 0.66) compared to the naïve LLM (M = 3.05, SD = 1.06). An independent samples t-test confirmed this difference t(65.3) = 6.58, p < 0.0001, further supporting the benefit of augmented Large Language models. Cohen’s Kappa test resulted in a score of .76 for a substantial agreement for readability and rational clarity scale. Limitations include small sample size; further fine-tuning and larger real-world validation studies remain necessary. Linking GPT-4 to NCCN guidelines via retrieval-augmented generation significantly improves treatment accuracy and reduces hallucination severity in breast cancer care scenarios. While further refinement is needed, the reduced incidence of severe hallucinations, increase in guideline-concordant recommendations, and enhanced readability findings support further development and prospective validation of guideline-anchored LLMs for clinical use. Citation Format: C. Yost, A. Aseem, B. Callas, S. Monick, L. Bonilla, R. Nguyen, R. Galamaga, S. Osborn, Y. Kumar, N. Ertz-Archambault. Retrieval-augmented GPT-4 improves NCCN-concordant breast cancer treatment recommendations [abstract]. In: Proceedings of the San Antonio Breast Cancer Symposium 2025; 2025 Dec 9-12; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2026;32(4 Suppl):Abstract nr PS3-06-29.

Ähnliche Arbeiten