Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Abstract PD11-09: The St. Gallen AI Consensus - Should AI have a vote?

2026·0 Zitationen·Clinical Cancer Research

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Introduction: Early breast cancer treatment requires complex individual risk assessment integrating tumor biology, staging, and patient factors. This complexity creates heterogeneous treatment approaches globally. Expert panels provide consensus guidance, but artificial intelligence may offer advantages including real-time access to all published data and freedom from individual, institutional, cultural, and emotional biases. This study represents the first comprehensive comparison between AI-generated breast cancer treatment recommendations and expert panel consensus across early-stage breast cancer management. Methods: We analyzed 80 distinct clinical scenarios from the final voting of the 2025 St. Gallen International Breast Cancer Conference. We included breast/axillary surgery, radiation therapy, systemic treatment, elderly care, and recurrence management. We excluded scenarios focusing on genetic risk (e.g. BRCA testing) and DCIS. Clinical scenarios were presented to four Large Language Models (LLMs): Claude Sonnet 4, Google Gemini 2.5 flash, ChatGPT-4o and DeepSeek-V3. Dates of interaction were July 8th and 9th, 2025. We designed three sequential prompts to (1) answer all clinical scenarios selecting single best options, (2) compare AI responses with expert panel percentages and identify disagreements, (3) analyze disagreements with evidence-based rationales for both AI and expert positions. The primary outcome was the agreement rate between AI and expert consensus, defined as identical answers for standard questions. Results: Overall agreement rates varied substantially: ChatGPT 60.0% (48/80), Gemini 57.5% (46/80), DeepSeek 48.8% (39/80), Claude 26.3% (21/80). Agreement differed dramatically by clinical category, ranging from 70.8% (endocrine therapy, n=6) to 25.0% (radiation therapy, n=5). Further, LLMs showed 56.8% agreement in axillary surgery (n=11) and 58.3% sentinel lymph node omission (n=3) and 35.9% in regard to genomic risk scores (n=16). In a second step we unblinded each LLM to the expert opinion and answers of the other LLMs and prompted each LLM to review its discordant answers: in 50-100% (ChatGPT 50% (16/32) and Claude 100% (59/59)) of questions the LLMs revised their initial answers and accepted the panelists’ recommendation. Still, 19%-50% (DeepSeek 19% (8/41) and ChatGPT 50% (16/32)) of discordant answers remained unchanged. Further in-depth analysis regarding the evidence-based reasoning of AI will be presented at the meeting. Conclusion: There was poor alignment between AI and expert medical consensus in early breast cancer treatment decisions. This reveals current limitations in AI's ability to integrate complex and multifactorial clinical reasoning. At the same time, AI offers an unbiased, and comprehensive data-driven perspective, effectively serving as a critical mirror for expert panels. Citation Format: A. Oseledchyk, W. Weber, B. Kasenda. The St. Gallen AI Consensus - Should AI have a vote? [abstract]. In: Proceedings of the San Antonio Breast Cancer Symposium 2025; 2025 Dec 9-12; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2026;32(4 Suppl):Abstract nr PD11-09.

Autoren

Institutionen

University Hospital of Basel(CH)

Themen

Artificial Intelligence in Healthcare and EducationCardiovascular Health and Risk FactorsRadiomics and Machine Learning in Medical Imaging

Volltext beim Verlag öffnen

Abstract PD11-09: The St. Gallen AI Consensus - Should AI have a vote?

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen