Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Second opinions and treatment guidance in non- or minimally metastatic testicular cancer: Is ChatGPT 5 non-inferior to expert recommendations?

2026·0 Zitationen·Journal of Clinical Oncology

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

596 Background: Testicular cancer is rare. Many cases presented to the German online second-opinion platform (eKonsil) involve non- or minimally metastatic stages. Expert reviews aim to optimize care quality. Large language models (LLMs) such as ChatGPT version 5 may support this process if their recommendations are shown to be non-inferior to expert consensus. Methods: Eleven eKonsil cases with primary non- or minimally metastatic disease were evaluated. Consensus treatment recommendations were generated by three testicular cancer experts (AH, JH, HS) based on the EAU guidelines. The same cases were entered into GPT5 with reference to the EAU guidelines on different days, in different countries, and by different individuals in September 2025. Recommendations by GPT5 and expert consensus were assessed independently by three urologists using a four-point Likert scale based on 21 validated evaluation criteria. Results: We obtained 99 evaluations of adherence to expert opinion. Expert recommendations for direct treatment were available for 81 evaluations. Primary therapy suggested by GPT5 was acceptable (no or insignificant deviations from experts) in 70% (57/81). GPT5’s adherence to expert opinion was similar for localized and metastasized cases (3.14 vs. 3.38; p Wilcoxon = 0.38). Adherence differed significantly depending on suggested therapeutic alternatives (p Chi2 < .01) with chemotherapeutic compounds being suggested correctly in 85% (74/87), radiation in 78% (47/60), active surveillance in 76% (50/66) and surgery in 59% (30/51). GPT5 acceptably identified missing information ahead of therapeutic suggestions in 82% (22/27) of relevant cases. In case of deficits, an acceptable path of diagnostics was suggested in 74% (20/27). Performance in identifying informational deficits did not vary by metastatic status (p Chi2 = 0.19) and performance in suggesting diagnostics was no better than in suggesting primary therapy (3.30 vs. 3.14; p Wilcoxon = 0.48). With GPT5 failing to reach clinically acceptable ratings in over 95% for any metric, it did not meet criteria of non-inferiority to experts. Conclusions: GPT5-generated second-opinion recommendations for patients with non- or minimally metastatic testicular cancer cannot be assumed to be non-inferior to expert consensus. Caution is needed when using GPT5 as an assistant to screen cases for missing information or to generate therapeutical suggestions, particularly with surgical paths being involved. This study shows that LLMs, even when provided with guidelines, require more training and expert validation currently remains essential.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationTesticular diseases and treatmentsTopic Modeling

Volltext beim Verlag öffnen

Second opinions and treatment guidance in non- or minimally metastatic testicular cancer: Is ChatGPT 5 non-inferior to expert recommendations?

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen