Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Can ChatGPT-4 Think Like an Orthopaedic Surgeon? Testing Clinical Judgement and Diagnostic Ability in Bony and Soft Tissue Pathologies of the Foot and Ankle
0
Zitationen
6
Autoren
2025
Jahr
Abstract
Submission Type: Achilles Tendon Rupture Research Type: Level 4 – Case series Introduction/Purpose: ChatGPT-4, a chatbot with an ability to carry human-like conversation, has attracted attention after demonstrating aptitude to pass professional licensure examinations and perform at the level of a postgraduate level three orthopaedic surgery resident on the Orthopaedic In-Service Training Examination question bank sets. The purpose of this study was to explore the diagnostic and decision-making capacities of ChatGPT-4 in clinical management specifically assessing for accuracy in identification and treatment of soft tissue foot and ankle pathologies. Methods: This study presented 16 soft tissue related foot and ankle cases to ChatGPT-4, with each case assessed by 3 fellowship-trained foot and ankle orthopaedic surgeons. The evaluation system included 5 criteria within a Likert scale, scoring from 5 (lowest) to 25 (highest possible). The criteria included stating the correct diagnosis (1), stating the most appropriate procedure (2), identification of alternative treatments (3), providing comprehensive information beyond treatment (4), and not mentioning nonexistent therapies (5). ChatGPT-4 was referred to as “Dr. GPT”, using role prompting to encourage step-by-step processing and establish a peer dynamic so that the role of an orthopaedic surgeon was emulated by the chatbot. Results: The average score across all criteria for all 16 cases was 4.47, with an average sum score of 22.4. The plantar fasciitis case received the highest score, with an average sum score of 24.7. The lowest score was observed in the peroneal tendon tear case, with an average sum score of 16.3. Subgroup analyses of each of the 5-criterion using Friedman Rank Sum tests showed no statistically significant differences in surgeon grading. Criterion 5, lack of mention of nonexistent treatment options, and criterion 1, ChatGPT’s ability to correctly diagnose, received the highest subgroup scores of 4.88 and 4.77, respectively. The lowest criteria score was observed in criteria 4 (4.05), evaluating ChatGPT-4 providing comprehensive information beyond treatment options. Conclusion: This study demonstrates that ChatGPT-4 effectively diagnosed and provided reliable treatment options for most soft tissue foot and ankle cases presented, noting consistency amongst surgeon evaluators. Individual criterion assessment revealed that ChatGPT-4 was most effective in diagnosing and suggesting appropriate treatment, but limitations were seen in the chatbot’s ability to provide comprehensive information and alternative treatment options. Additionally, the chatbot successfully did not suggest fabricated treatment options, a common concern in prior literature. This resource could be useful for clinicians seeking reliable patient education materials without fear of inconsistencies, though comprehensive information beyond treatment may be limited.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.380 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.243 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.671 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.496 Zit.