OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.03.2026, 06:36

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A rubric‑based comparison of code red and specialists in managing CAR‑T–Related cytokine release syndrome and icans

2025·0 Zitationen·BloodOpen Access
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2025

Jahr

Abstract

Abstract Background Large language models (LLMs) coupled with retrieval-augmented generation (RAG) can deliver point-of-care, guideline-anchored answers for complex oncologic toxicities. When iteratively refined with domain experts, these systems may rival—or surpass—individual specialist performance. Objectives To present and benchmark a purpose-built module of Code Red (https://chatbot.codigorojo.tech/)—an educational, medicine-wide generative-AI project—designed specifically to improve the clinical management of toxicities from immune effector cell therapies, particularly CAR-T (e.g., CRS, ICANS). This module aims to provide near–real-time, reference-backed recommendations that have been iteratively refined with domain experts, addressing the current deviations from guideline-based care and the resulting heterogeneity in real-world practice. Methods Three simulated CAR-T toxicity cases were constructed to span varying grades and scenarios. Each case was independently answered by two CAR-T–experienced hematologist/oncologists (n=3 per case) and by Code Red. An external LLM (ChatGPT o3) served as a blinded adjudicator, applying a seven-item rubric—clinical accuracy/guideline concordance (45%), safety & risk mitigation (15%), completeness/contextualization (10%), actionability & clarity (10%), reference quality (10%), transparency about uncertainty (5%), and form/communication efficiency (5%). Each item was scored 0–10, scores were standardized across cases/raters, and then combined into a single weighted composite to select the “winner” per case. Code Red uses a RAG pipeline over a curated corpus of CAR-T toxicity guidelines and primary literature, plus rule-based safeguards for dosing and citation integrity. Results Using a seven-item, 0–10 rubric (weights: 45/15/10/10/10/5/5%), the standardized composite scores were Code Red 8.8/10, Expert 1: 8.0, Expert 2: 7.6, Expert 3: 5.3. Code Red exceeded the top individual expert by +0.8 points (~8% absolute gain) and the aggregated expert mean (≈7.0/10; median 7.6; range 5.3–8.0) by +1.8 points (~26% relative gain). Criterion-wise, Code Red scored 10/10 in clinical accuracy/guideline concordance and in safety/risk mitigation, and ≥9/10 in completeness, actionability, and structure/efficiency; transparency about uncertainty was moderate (6/10). Experts, when averaged, reached 8.0 in accuracy, 8.0 in safety, 7.7 in completeness, 8.3 in actionability, 3.3 in transparency, and 7.0 in structure/efficiency. After weighting, Code Red ranked first in every case, leading the blinded adjudicator to select it as the top response throughout. Its margin was driven by a consistently protocol-level presentation—explicit monitoring schedules, predefined intervention thresholds (e.g., tocilizumab at 24 h of persistent fever), steroid regimens, ICU escalation criteria, and tertiary options (anakinra/siltuximab)—delivered in concise, highly actionable language. Conclusions A lean, expert-guided RAG system (Code Red) can outperform individual CAR-T specialists on simulated toxicity management scenarios while delivering rapid guidance. Ongoing improvement will rely on continuous user feedback and automated literature surveillance to preserve patient safety and guideline fidelity.

Ähnliche Arbeiten