Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial intelligence (AI) as a practical decision support tool for oncologists: Genitourinary (GU) cancer cases.
0
Zitationen
6
Autoren
2026
Jahr
Abstract
565 Background: Multidisciplinary reviews (MDR) can significantly impact the management of oncologic patients, yet these are often underutilized. We previously presented more than 400 anonymized oncologic cases, including genitourinary (GU) cancer cases, to a MDR panel consisting of specialists belonging to medical oncology, surgical oncology, and radiation oncology. Here, we compare MDR recommendations to those produced by 3 different foundational AI models. Methods: We selected 35 complex GU cancer cases reviewed by MDR panels between 2020 and 2021 from a larger database that included several tumor types. These were then evaluated by OpenAI’s ChatGPT 4.5, Anthropic’s Claude Opus 4, and Google’s Gemini Ultra using PrecisCa’s proprietary prompting method. Recommendations from each system were scored on a 1-5 scale (with 5 being the highest) for completeness, reasoning, clarity, menu of options, recency, and relevance, and compared with those from the MDR panel. A maximum score of 30 points per scenario and 1,050 points overall was possible. Final AI recommendations were also compared with current National Comprehensive Cancer Network (NCCN) guidelines. Comparison in reverse (i.e., additional AI options that the experts had missed) was not conducted due to interval treatment recommendation changes in the preceding 4 years. Results: Patient characteristics, histology, and aggregate competence scores are shown (Table 1). Across all AI models, performance was the best in cases involving testicular cancer and worst in bladder cancer cases. For testicular cancer cases, recency and relevance scores were the highest while scores for reasoning, clarity, and menu of options were the worst. For bladder cancer cases, clarity scores were the highest, and recency the worst. Overall, AI systems excelled in relevance, but less so in completeness and recency. While there was some variation in the concordance and competence ratings from the 3 AI models, there was good concordance with expert opinion for all of them. Discordant cases revealed minor differences that would not have significantly impacted patient management otherwise. Conclusions: This study demonstrates a high degree of concordance between 3 leading AI models and expert panel decisions for common, complex GU cancer clinical scenarios. The findings show that it is not premature to incorporate AI as a decision support tool, in conjunction with human specialists, in daily practice. Patient characteristics, histology, and aggregate competence scores (n=35). Median Age (Range) 52.5 (25-80) Histology Bladder 7 (20%) Kidney 11 (31.4%) Prostate 15 (42.9%) Testicular 2 (5.7%) Aggregate/Median Competence Scores (Range) ChatGPT 4.5 880/23.5 (17-30) Claude Opus 4 931/23.5 (17-30) Gemini Ultra 889/24 (18-30)
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.231 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.084 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.444 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.423 Zit.