OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.05.2026, 03:27

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

St. Gallen International Breast Cancer Consensus-Based Clinical Decision Validation: Concordance Assessment Between Deep Large Language Model Outputs and Global Expert Panel Recommendations

2026·1 Zitationen·Annals of Surgical OncologyOpen Access
Volltext beim Verlag öffnen

1

Zitationen

12

Autoren

2026

Jahr

Abstract

BACKGROUND: The newly developed large language model (LLM) DeepSeek has shown potential for application in other medical fields. However, few systematic studies have assessed its concordance with international expert consensus or compared its performance with leading models such as Gemini 2.0 Pro and ChatGPT-4o in breast cancer. MATERIALS AND METHODS: A total of 139 consensus questions from the 19th St. Gallen International Breast Cancer Conference (SG-BCC) were included into analysis. Each model was trained to answer each consensus question five times. The DeepSeek model was compared with the expert panel consensus in terms of concordance rate, robustness of the answers, Pearson correlation coefficient r for non-binary questions, and absolute proportion difference for binary questions. At the same time, a horizontal comparison was made with the previous LLMs Gemini 2.0 Pro and ChatGPT-4o. RESULTS: The overall concordance rate between DeepSeek-V3 and the expert panel consensus was 63.31%, and the average answer robustness (i.e., its self-consistency across repeated queries) of DeepSeek-V3 was 86.69%. In addition, DeepSeek-V3 performed similarly to Gemini 2.0 Pro and ChatGPT-4o in terms of concordance rate of the most frequent answers (p = 0.849). In terms of model robustness, there were significant statistical differences among the models (p < 0.001), with DeepSeek-V3 significantly outperforming Gemini 2.0 Pro (p = 0.005) and ChatGPT-4o (p < 0.001). CONCLUSIONS: DeepSeek models showed moderate concordance in following the consensus of breast cancer expert panel and showed significant advantages in answer robustness, suggesting that DeepSeek has great application potential in the field of clinical decision-making for breast cancer.

Ähnliche Arbeiten