OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.04.2026, 01:17

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Abstract 20: An agentic AI workflow for automated, high-fidelity curation of cancer diagnosis and staging from unstructured patient records.

2026·0 Zitationen·Cancer Research
Volltext beim Verlag öffnen

0

Zitationen

9

Autoren

2026

Jahr

Abstract

Abstract Purpose: Artificial intelligence (AI)-driven clinical document abstraction (CDA) using large language models (LLMs) is transforming oncology data management by extracting crucial data, such as staging and molecular profiles, from unstructured notes. We validated an agent that autonomously abstracts cancer diagnoses into structured, high-quality real-world evidence (RWE), accelerating research and supporting timely evidence-based treatment decisions with clinical-grade accuracy at scale and operational sustainability. Methods: Our AI-CDA workflow uses a three-stage hybrid multi-agent AI design for high-accuracy, cost-efficient data abstraction: 1. Pre-screening: two specialized natural language processing (NLP) models fine-tuned on clinical notes scan the entire patient chart, classifying documents and identifying key cancer diagnosis events to select up to 100 relevant documents, reducing the LLM inference cost. 2. Extractor: a non-reasoning LLM (GPT-4.1) extracts all relevant diagnostic information from selected documents. 3. Structuring and normalization: a GPT-4.1 call synthesizes diagnosis summaries to produce structured clinical fields (e.g., diagnosis date, stage, histology) according to a predefined data model. Finally, structured fields are normalized to a fixed taxonomy by o3-mini. System performance was benchmarked against manual abstraction on a cohort of 1497 patients (499 breast, 499 lung, 499 pan-cancer) sampled from the Tempus RW database. Results: The automated workflow performed strongly against manual abstraction. Tissue of origin abstraction achieved a micro F1-score of 0.94, with excellent performance on common cancers (e.g., lung: 0.98, breast: 0.98, colon: 0.97). Metastasis status detection achieved 0.92 F1-score, histology 0.91, and overall stage 0.83. Initial diagnosis date accuracy was 0.90. A subsequent, adjudicated re-evaluation of discordance revealed an even higher real-world performance by correcting initial manual labeling errors. Histology F1-score rose to 0.96 (+0.05) and overall staging rose to > 0.91 (a lift of over 10%). With regard to cost efficiency: pre-screening with two specialized NLP models reduced the average number of documents requiring LLM review >90% (from 266 to 24 per patient). This lowers LLM cost, mitigates performance degradation associated with longer context, and prioritizes only the most pertinent information. Conclusions: This study demonstrates the feasibility and operational sustainability of an agentic AI-CDA workflow for highly accurate cancer data abstraction. This hybrid solution addresses three critical needs: scaling clinical research through automated, high-fidelity data abstraction; supporting timely evidence-based treatment decisions; and achieving operational sustainability by reducing inference cost and resourcing through a hybrid NLP/LLM workflow. Citation Format: Tian Kang, Elizabeth Dougherty, Shivam Mishra, Ryan Godart, Arpita Saha, Jonathan Wills, Victoria L. Chiou, Maria A. Berezina, Kunal Nagpal. An agentic AI workflow for automated, high-fidelity curation of cancer diagnosis and staging from unstructured patient records [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 20.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Machine Learning in HealthcareCardiovascular Health and Risk FactorsArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen