Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Optimizing Large Language Models for Automated Protocoling of Abdominal and Pelvic CT Scans: The Power of Context
0
Zitationen
7
Autoren
2026
Jahr
Abstract
Background Accurate protocoling is critical for imaging accuracy. Manual protocoling is time-consuming and error prone. Purpose To evaluate the performance of large language models (LLMs) in automatically assigning protocols for abdominal and pelvic CT scans after optimization with context engineering and fine-tuning and to compare performance with that of radiologists in practice. Materials and Methods This retrospective study included patients with abdominal or pelvic CT scans obtained between January 2024 and June 2024. Requisition data, human-selected protocol, and training level (resident, fellow, or radiologist) were extracted. Reference standard protocols were defined by radiologists in consultation with institutional guidelines. Context engineering involved detailed prompt instructions using a prompt set with GPT-4o (version 2024-08-06; Open AI). A subset of patients was reserved for fine-tuning (training set and validation set) and another for testing (internal test set). Two models were tested (prompting-only and fine-tuned). Model-selected protocols and original human-selected protocols were categorized compared with the reference standard after review by blinded radiologists as follows: exact match, equal alternative, reasonable but inferior, or inappropriate. Exact match and equal alternative were considered optimal. Performance of models and radiologists were compared using the McNemar test. Results This study included 1448 patients (mean age, 61 years ± 17 [SD]; 728 female patients). GPT-4o with prompting only selected optimal protocols more frequently than humans (96.2% [527 of 548 patients] vs 88.3% [484 of 548 patients]; <i>P</i> < .001), but there was no evidence of a difference in inappropriate protocols (1.3% [seven of 548 patients] vs 2.4% [13 of 548 patients]; <i>P</i> = .21). Fine-tuning GPT-4o did not improve the proportion of optimal protocols over prompting only (96.2% [527 of 548 patients] vs 96.2% [527 of 548 patients]; <i>P</i> > .99). In subgroup analyses, the proportion of protocols matching the reference standard was similar among radiologists (79.4% [173 of 218 patients]), fellows (74.9% [164 of 219 patients]), and residents (72.1% [80 of 111 patients]; <i>P</i> = .30). Conclusion For protocoling abdominal and pelvic CT scans, the LLM, GPT-4o, selected optimal protocols more frequently than radiologists when optimized with detailed prompting, and fine-tuning of the model did not further improve performance. © RSNA, 2026 <i>Supplemental material is available for this article.</i>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.214 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.071 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.429 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.418 Zit.