Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Leveraging large language models (LLM) to guide adjuvant chemotherapy (AC) recommendations in stage III colorectal cancer (CRC).
0
Zitationen
20
Autoren
2026
Jahr
Abstract
103 Background: Clinicians spend significant time extracting and synthesizing information to risk-stratify and make adjuvant chemotherapy (AC) recommendations for CRC. By automating parts of this process, LLMs could help reduce documentation burden and variability in decision-making. Methods: We selected 144 stage III, pMMR CRC patients treated at Mayo Clinic (2017–2021). HIPAA-compliant Gemini-2.0-Flash-001 and GPT-4o extracted clinical variables (age, neuropathy grade, pathological stage) from EHRs and generated AC recommendations. Two approaches were compared: (1) a rule-based algorithm with nested if/else statements translating variables into NCCN-based recommendations, and (2) a dynamic LLM agent generating recommendations using a structured prompt informed by NCCN guidelines. Concordance with treating physicians was assessed, with disagreements reviewed by a blinded GI oncologist. Results: The dataset included 144 patients with median age 67 (IQR 66–68), 48.6% female. The LLM pipeline extracted neuropathy grade with 100% accuracy and pathological T/N stages with 99.3% accuracy. Concordance with physician recommendations was 93.1% for the rule-based algorithm and 62.5% for the LLM agent. In discordant cases, blinded adjudication by a GI oncologist favored the algorithm and LLM agent over physicians in most cases (Table 1). Conclusions: This exploratory work shows LLM agents can reliably extract variables and generate evidence-based treatment recommendations for stage III CRC. The high rate of expert preference for LLM recommendations in discordant cases is intriguing but limited by small sample size and possible adjudicator bias. Discordant physician recommendations reflect the complexity of decision-making, including patient preferences, performance status, comorbidities, and subtle findings not fully captured in EHRs. Rather than replacing judgment, we envision LLMs as decision support tools—analogous to an advanced trainee reviewing charts and presenting evidence-based recommendations to the attending physician with ultimate authority. Larger multi-institutional studies with multiple adjudicators are needed to validate these findings and clarify AI’s role in oncology. Patient characteristics, LLM extraction accuracy, and concordance with physician recommendations. Metric Result Total patients 144 Median age (IQR) 67 (66-68) Female (%) 48.6% LLM extraction accuracy – Neuropathy grade 100% LLM extraction accuracy – Pathological T and N stages 99.3% Primary Concordance – Rule-based algorithm vs. physician 93.1% Adjudicator preference of rule-based algorithm over treating physician for discordant cases 100% Concordance – LLM agent vs. physician 62.5% Adjudicator preference of LLM agent over treating physician for discordant cases 82.0%