Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ChatGPT and claude in hand surgery: an explanatory evaluation of clinical decision support on common surgical cases
0
Zitationen
7
Autoren
2025
Jahr
Abstract
INTRODUCTION: Large language models (LLMs) have gained increasing popularity in several medical disciplines. In orthopedic research however, their integration into routine practice have been questioned as they do not seem to outperform experienced clinicians. Conversely, research on the role of artificial intelligence in hand surgery remains limited. This study aims to evaluate two common LLMs in medicine, Generative Pre-trained Transformer (ChatGPT) and Claude in the clinical hand surgery setting. METHODS: Ten questions pertinent to common hand surgical diagnosis were formulated as prompts and entered into ChatGPT and Claude in a systematic manner. The generated responses were anonymously evaluated by hand surgeons, who assessed the quality of the responses according to the QUEST criteria. Gwet's AC2 was used to evaluate the agreement between raters. RESULTS: In general, ChatGPT and Claude performed statistically similar according to the dimensions of QUEST including (1) Quality of information, (2) Understanding and reasoning, (3) Expression style and persona, (4) Safety and harm and (5) Trust and confidence although with relatively modest scores. Agreement between hand surgeons across all measurements was low according to Gwet's AC2 (0.29). CONCLUSIONS: ChatGPT and Claude perform similarly when provided with various common hand surgery related questions. However, they demonstrate significant limitations pertaining to clinical accuracy and reliability that are the core foundation for patient safety, treatment efficiency and evidence-based practice. Furthermore, as the function of ChatGPT and Claude seem to differ between individual hand surgeons, these LLMs in their current state are not suitable for routine clinical use in hand surgery. LEVEL OF EVIDENCE: V.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.626 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.532 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.046 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.843 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.