OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.03.2026, 13:46

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing the performance of large language model Artificial Intelligence in thyroid cancer multidisciplinary team (MDT) decision-making

2025·0 Zitationen·British journal of surgeryOpen Access
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

Abstract Background Large language models (LLMs) have potential in augmenting clinical decision-making. The performance of three leading LLMs (ChatGPT, MetaAI, and DeepSeek) was evaluated in thyroid cancer MDT decision-making. Method Forty clinical cases (20 simulated/20 genuine), median age 45 (25–74), 26F, were discussed in a regional thyroid cancer MDT to provide the reference decisions. Cases were then blind-assessed by three LLMs that had been trained on contemporary guidelines, using uniform prompts. LLM outcomes were compared with reference decisions via a 4-point concordance score. Thematic analysis was performed. Results ChatGPT demonstrated the highest overall concordance score, 97/120 (36 cases concordant/near-concordant), DeepSeek scored 93/120 (33 concordant/near-concordant) and MetaAI performed least well 87/120. Despite identical guideline references, LLMs offered differing interpretation and application. ChatGPT offered the greatest nuance, with outputs that closely mirrored MDT phrasing and reasoning, especially around complex/borderline cases. ChatGPT and DeepSeek explicitly acknowledged patient choice, but ChatGPT more consistently contextualised it within guideline-compliant treatment options. MetaAI demonstrated more variability, with several examples of overtreatment or misalignment with current clinical practice. None of the LLMs recommended more recent treatment options for poorly-differentiated/anaplastic disease such as neoadjuvant or combination chemotherapy. Conclusion LLMs showed good concordance with clinical MDT decisions, even in complex real-world cases. However, despite receiving the same prompts there were differences in reasoning. ChatGPT’s nuanced, context-aware outputs appear most promising as a decision-support tool. Last, because LLMs lack intrinsic capability for continual learning at present, they have clear limitations, for example in keeping pace with some areas of advancing clinical practice.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationThyroid Cancer Diagnosis and TreatmentExplainable Artificial Intelligence (XAI)
Volltext beim Verlag öffnen