Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the performance of large language model Artificial Intelligence in thyroid cancer multidisciplinary team (MDT) decision-making
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Abstract Background Large language models (LLMs) have potential in augmenting clinical decision-making. The performance of three leading LLMs (ChatGPT, MetaAI, and DeepSeek) was evaluated in thyroid cancer MDT decision-making. Method Forty clinical cases (20 simulated/20 genuine), median age 45 (25–74), 26F, were discussed in a regional thyroid cancer MDT to provide the reference decisions. Cases were then blind-assessed by three LLMs that had been trained on contemporary guidelines, using uniform prompts. LLM outcomes were compared with reference decisions via a 4-point concordance score. Thematic analysis was performed. Results ChatGPT demonstrated the highest overall concordance score, 97/120 (36 cases concordant/near-concordant), DeepSeek scored 93/120 (33 concordant/near-concordant) and MetaAI performed least well 87/120. Despite identical guideline references, LLMs offered differing interpretation and application. ChatGPT offered the greatest nuance, with outputs that closely mirrored MDT phrasing and reasoning, especially around complex/borderline cases. ChatGPT and DeepSeek explicitly acknowledged patient choice, but ChatGPT more consistently contextualised it within guideline-compliant treatment options. MetaAI demonstrated more variability, with several examples of overtreatment or misalignment with current clinical practice. None of the LLMs recommended more recent treatment options for poorly-differentiated/anaplastic disease such as neoadjuvant or combination chemotherapy. Conclusion LLMs showed good concordance with clinical MDT decisions, even in complex real-world cases. However, despite receiving the same prompts there were differences in reasoning. ChatGPT’s nuanced, context-aware outputs appear most promising as a decision-support tool. Last, because LLMs lack intrinsic capability for continual learning at present, they have clear limitations, for example in keeping pace with some areas of advancing clinical practice.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.