OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.03.2026, 12:51

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A framework for contextual mass categorization leveraging explainable AI.

2026·0 Zitationen·Journal of Clinical Oncology
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2026

Jahr

Abstract

657 Background: Large healthcare systems often use a centralized radiology report database which facilitates data mining for care delivery, such as artificial intelligence (AI)-assisted navigation of newly suspected cancer. Since 2023, our institution has used a natural language processing model (NLP) to identify patients with suspected pancreatic cancer. Whereas large language models (LLMs) tend to be “black boxes,” hindering users’ understanding of their analytical process, entity extraction large language models (eeLLMs) provide users with clearer insights into how the model makes its decisions. With this enhanced transparency and control, we hypothesize that eeLLMs will accurately categorize reports into highly suspicious, moderately suspicious and occult masses, providing greater insight into why a report was characterized as suspicious. Methods: Radiology reports initially flagged as suspicious by an NLP model were analyzed by an eeLLM into three binary classifications: highly suspicious for pancreatic cancer, moderately suspicious, and presence of occult mass. For each report, the LLM performed independent binary assessments for the presence of these characteristics. The performance of each model was calculated per identified mass instance using precision and recall analysis with an F1 score for the model. Results: The eeLLMs’ performance characterizing pancreatic masses was evaluated across 78 radiology reports (Table 1). These models showed varying precision, recall, and F1 score which was independent of their version number. It was notable that not all models could provide data for all radiology reports. Conclusions: eeLLMs performance was generally good compared to gold-standard manual review. Model performance varied yet multiple eeLLM models provide a path to improving precision without affecting recall. This suggests that eeLLMs may be able to serve as alternatives to traditional NLPs for mass identification. Future directions include testing eeLLMs on hospital-wide datasets for early identification of pancreatic masses and developing prompts to assess additional pancreatic cancer criteria. Analysis of various large language models on identification of pancreatic mass radiology report features. Each model was analyzed using precision, recall and F1 score. N refers to the number of radiology reports it was able provide data for. Model Highly Suspicious Masses Precision Recall F1 Score Moderately Suspicious Masses Precision Recall F1 Score Occult Masses Precision Recall F1 Score GPT-5( N =78) 68.4%100%0.813 84.6%78.6%0.814 90.9%95.2%0.930 GPT-4.1( N =78) 76.5%100%0.867 100%92.9%0.962 100%85.7%0.923 GPT-4o( N =76) 92.3%100%0.960 85.7%100%0.923 100%80.0%0.889 Claude 4.1 Opus ( N =71) 80.0%100%0.889 77.8%70.0%0.737 100%50.0%0.667 Claude 4 Sonnet ( N =67) 71.4%83.3%0.769 100%84.6%0.917 100%88.9%0.941 Gemini 2.0 Flash( N =78) 86.7%100%0.929 84.6%78.6%0.815 100%0.667%0.800

Ähnliche Arbeiten