OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 23.04.2026, 04:38

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ONCO-RADS–guided Large Language Models for Extraction and Classification of Incidental Findings on Whole-Body Imaging Reports

2026·0 Zitationen·Radiology Imaging CancerOpen Access
Volltext beim Verlag öffnen

0

Zitationen

15

Autoren

2026

Jahr

Abstract

Purpose To evaluate large language model (LLM)-based strategy performance for extraction and classification of incidental findings from whole-body (WB) imaging reports, particularly strategies incorporating Oncologically Relevant Findings Reporting and Data System (ONCO-RADS). Materials and Methods In this retrospective bicenter study, authors included all WB MRI reports from January 2016 to December 2023 at a referral center (internal dataset). Two observers extracted all incidental findings, and patient records were used to confirm final diagnoses. First, authors evaluated ONCO-RADS performance and the reproducibility of its incidental finding classifications by six radiologists. Then, authors evaluated the accuracy of three LLM-based strategies: <i>(a)</i> a fine-tuned DeBERTa/medical named entity recognition (NER) model; <i>(b)</i> zero-shot LLMs (ChatGPT-o1 [OpenAI], Gemini-2.5-Pro [Google]); and <i>(c)</i> reference-guided prompting of these LLMs using ONCO-RADS. Authors then expanded these strategies to an external dataset of 605 reports with multiple imaging techniques (405 WB MRI; 100 fluorodeoxyglucose PET/CT; and 100 chest-abdomen-pelvis CT acquisitions) from January 2022 to January 2025. Results The internal dataset included 823 patients (mean age, 63.7 years ± 11.7 [SD]; 457 male patients) with 1488 WB MRI reports. The average interobserver reproducibility of ONCO-RADS incidental finding classifications was excellent (Cohen κ, 0.87). The per-report accuracies of ONCO-RADS-guided LLMs (95.6% [151 of 158] and 86.7% [137 of 158] for ChatGPT-o1 and Gemini-2.5-Pro, respectively) were higher than those of the medical NER (69.0% [109 of 158]) and zero-shot LLMs (57.0% [90 of 158] and 70.9% [112 of 158] for ChatGPT-o1 and Gemini-2.5-Pro, respectively) (<i>P</i> < .001). In the external test set (mean age, 60.6 years ± 12.9; 330 male patients), the per-report accuracies of ONCO-RADS-guided ChatGPT-o1 (83.5% [505 of 605]) and Gemini-2.5-Pro (82.0% [496 of 605]) were higher than those of the models without ONCO-RADS prompting (63.1% [382 of 605] and 61.2% [370 of 605], respectively) and the medical NER (55.7% [337 of 605]) (<i>P</i> < .001). Conclusion Reference-guided prompting of the LLMs ChatGPT-o1 and Gemini-2.5-Pro with ONCO-RADS improved their performance in extracting and classifying incidental findings on WB imaging reports compared with zero-shot prompting and medical NER. <b>Keywords:</b> Large Language Models, Incidental Findings, Whole-Body MRI <i>Supplemental material is available for this article.</i> © RSNA, 2026.

Ähnliche Arbeiten