Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparison of ChatGPT's Diagnostic and Management Accuracy of Foot and Ankle Bone–Related Pathologies to Orthopaedic Surgeons
1
Zitationen
6
Autoren
2025
Jahr
Abstract
INTRODUCTION: The steep rise in utilization of large language model chatbots, such as ChatGPT, has spilled into medicine in recent years. The newest version of ChatGPT, ChatGPT-4, has passed medical licensure examinations and, specifically in orthopaedics, has performed at the level of a postgraduate level three orthopaedic surgery resident on the Orthopaedic In-Service Training Examination question bank sets. The purpose of this study was to evaluate ChatGPT-4's diagnostic and decision-making capacity in the clinical management of bone-related injuries of the foot and ankle. METHODS: Eight bone-related foot and ankle orthopaedic cases were presented to ChatGPT-4 and subsequently evaluated by three fellowship-trained foot and ankle orthopaedic surgeons. Cases were scored using criteria on a Likert scale, graded from a total score of 5 (lowest) to 25 (highest) across five criteria. ChatGPT-4 was referred to as "Dr. GPT," establishing a peer dynamic so that the role of an orthopaedic surgeon was emulated by the chatbot. RESULTS: The average score across all criteria for each case was 4.53 of 5, noting an overall average sum score of 22.7 of 25 for all cases. The pathology with the highest score was the second metatarsal stress fracture (24.3), whereas the case with the lowest score was hallux rigidus (21.3). Kendall correlation analysis of interrater reliability showed variable correlation among surgeons, without statistical significance. CONCLUSION: ChatGPT-4 effectively diagnosed and provided appropriate treatment options for simple bone-related foot and ankle cases. Importantly, ChatGPT did not fabricate treatment options (ie, hallucination phenomenon), which has been previously well-documented in the literature, notably receiving its second-highest overall average score in this criterion. ChatGPT struggled to provide comprehensive information beyond standard treatment options. Overall, ChatGPT has the potential to serve as a widely accessible resource for patients and nonorthopaedic clinicians, although limitations may exist in the delivery of comprehensive information.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.611 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.504 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.025 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.835 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.