Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A framework for contextual mass categorization leveraging explainable AI.
0
Zitationen
5
Autoren
2026
Jahr
Abstract
657 Background: Large healthcare systems often use a centralized radiology report database which facilitates data mining for care delivery, such as artificial intelligence (AI)-assisted navigation of newly suspected cancer. Since 2023, our institution has used a natural language processing model (NLP) to identify patients with suspected pancreatic cancer. Whereas large language models (LLMs) tend to be “black boxes,” hindering users’ understanding of their analytical process, entity extraction large language models (eeLLMs) provide users with clearer insights into how the model makes its decisions. With this enhanced transparency and control, we hypothesize that eeLLMs will accurately categorize reports into highly suspicious, moderately suspicious and occult masses, providing greater insight into why a report was characterized as suspicious. Methods: Radiology reports initially flagged as suspicious by an NLP model were analyzed by an eeLLM into three binary classifications: highly suspicious for pancreatic cancer, moderately suspicious, and presence of occult mass. For each report, the LLM performed independent binary assessments for the presence of these characteristics. The performance of each model was calculated per identified mass instance using precision and recall analysis with an F1 score for the model. Results: The eeLLMs’ performance characterizing pancreatic masses was evaluated across 78 radiology reports (Table 1). These models showed varying precision, recall, and F1 score which was independent of their version number. It was notable that not all models could provide data for all radiology reports. Conclusions: eeLLMs performance was generally good compared to gold-standard manual review. Model performance varied yet multiple eeLLM models provide a path to improving precision without affecting recall. This suggests that eeLLMs may be able to serve as alternatives to traditional NLPs for mass identification. Future directions include testing eeLLMs on hospital-wide datasets for early identification of pancreatic masses and developing prompts to assess additional pancreatic cancer criteria. Analysis of various large language models on identification of pancreatic mass radiology report features. Each model was analyzed using precision, recall and F1 score. N refers to the number of radiology reports it was able provide data for. Model Highly Suspicious Masses Precision Recall F1 Score Moderately Suspicious Masses Precision Recall F1 Score Occult Masses Precision Recall F1 Score GPT-5( N =78) 68.4%100%0.813 84.6%78.6%0.814 90.9%95.2%0.930 GPT-4.1( N =78) 76.5%100%0.867 100%92.9%0.962 100%85.7%0.923 GPT-4o( N =76) 92.3%100%0.960 85.7%100%0.923 100%80.0%0.889 Claude 4.1 Opus ( N =71) 80.0%100%0.889 77.8%70.0%0.737 100%50.0%0.667 Claude 4 Sonnet ( N =67) 71.4%83.3%0.769 100%84.6%0.917 100%88.9%0.941 Gemini 2.0 Flash( N =78) 86.7%100%0.929 84.6%78.6%0.815 100%0.667%0.800
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.