Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of ChatGPT-4o as a Patient Information Tool for Common Orthopaedic Surgeries: Accuracy, Completeness, and Clinical Utility
0
Zitationen
5
Autoren
2025
Jahr
Abstract
INTRODUCTION: Artificial intelligence chatbots, such as ChatGPT-4o ("omni"), a large language model developed by OpenAI that integrates text, image, and audio processing with web connectivity, have gained traction as potential patient education tools in orthopaedic surgery. This study aimed to evaluate the accuracy, completeness, and clinical utility of ChatGPT-4o's responses to common patient questions about six widely performed orthopaedic procedures. METHODS: We assessed ChatGPT-4o's responses to five standardized patient-oriented queries for total knee arthroplasty, total hip arthroplasty, anterior cruciate ligament reconstruction, rotator cuff repair, anterior cervical diskectomy and fusion, and carpal tunnel release. Responses were generated using ChatGPT-4o's web-enabled version in January 2025. Two resident orthopaedic surgeons independently rated each response for accuracy, completeness, layperson clarity, misleading content, and conciseness using a structured binary rubric. The validated DISCERN instrument (16 items, max score 80) was adapted for quantitative assessment of information quality. Interrater reliability was assessed with Cohen kappa. RESULTS: Overall, ChatGPT-4o generated accurate and structured responses, free of overt errors. The average DISCERN score across procedures was 43.5, classifying the information as fair. The highest average DISCERN score was for anterior cervical diskectomy and fusion (mean 45.8 ± 10.1), whereas the lowest was for rotator cuff repair (mean 41.6 ± 5.9). Factual accuracy was high (>90%), but 36% of responses contained some misleading or incomplete information. Responses explaining treatment alternatives were the most accurate and complete, whereas those outlining surgical risks performed worst. Interrater agreement was good (Cohen kappa = 0.64). DISCUSSION: ChatGPT-4o provided generally accurate, clear, and empathetic explanations of common orthopaedic surgeries, offering a promising adjunct to conventional patient education. However, key limitations particularly regarding alternative treatments, nuanced risks, and lack of tailored advice limit its stand-alone use in clinical practice. Careful oversight and clinician vetting remain essential. CONCLUSIONS: ChatGPT-4o can supplement orthopaedic patient education by offering accessible, engaging content. However, notablenotable gaps in detail and occasional misleading information necessitate careful review and contextual explanation by orthopaedic surgeons.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.611 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.504 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.025 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.835 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.