Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Second opinions and treatment guidance in non- or minimally metastatic testicular cancer: Is ChatGPT 5 non-inferior to expert recommendations?
0
Zitationen
10
Autoren
2026
Jahr
Abstract
596 Background: Testicular cancer is rare. Many cases presented to the German online second-opinion platform (eKonsil) involve non- or minimally metastatic stages. Expert reviews aim to optimize care quality. Large language models (LLMs) such as ChatGPT version 5 may support this process if their recommendations are shown to be non-inferior to expert consensus. Methods: Eleven eKonsil cases with primary non- or minimally metastatic disease were evaluated. Consensus treatment recommendations were generated by three testicular cancer experts (AH, JH, HS) based on the EAU guidelines. The same cases were entered into GPT5 with reference to the EAU guidelines on different days, in different countries, and by different individuals in September 2025. Recommendations by GPT5 and expert consensus were assessed independently by three urologists using a four-point Likert scale based on 21 validated evaluation criteria. Results: We obtained 99 evaluations of adherence to expert opinion. Expert recommendations for direct treatment were available for 81 evaluations. Primary therapy suggested by GPT5 was acceptable (no or insignificant deviations from experts) in 70% (57/81). GPT5’s adherence to expert opinion was similar for localized and metastasized cases (3.14 vs. 3.38; p Wilcoxon = 0.38). Adherence differed significantly depending on suggested therapeutic alternatives (p Chi2 < .01) with chemotherapeutic compounds being suggested correctly in 85% (74/87), radiation in 78% (47/60), active surveillance in 76% (50/66) and surgery in 59% (30/51). GPT5 acceptably identified missing information ahead of therapeutic suggestions in 82% (22/27) of relevant cases. In case of deficits, an acceptable path of diagnostics was suggested in 74% (20/27). Performance in identifying informational deficits did not vary by metastatic status (p Chi2 = 0.19) and performance in suggesting diagnostics was no better than in suggesting primary therapy (3.30 vs. 3.14; p Wilcoxon = 0.48). With GPT5 failing to reach clinically acceptable ratings in over 95% for any metric, it did not meet criteria of non-inferiority to experts. Conclusions: GPT5-generated second-opinion recommendations for patients with non- or minimally metastatic testicular cancer cannot be assumed to be non-inferior to expert consensus. Caution is needed when using GPT5 as an assistant to screen cases for missing information or to generate therapeutical suggestions, particularly with surgical paths being involved. This study shows that LLMs, even when provided with guidelines, require more training and expert validation currently remains essential.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.324 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.189 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.588 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.470 Zit.