Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Can Artificial Intelligence Align with Evidence? Performance of ChatGPT-4o in Knee Osteoarthritis Surgical Guidelines
0
Zitationen
6
Autoren
2026
Jahr
Abstract
Artificial intelligence large language models (LLMs) such as ChatGPT are increasingly used in clinical settings, yet their reliability in reproducing evidence-based recommendations remains uncertain. This study aimed to evaluate the performance of ChatGPT-4o in addressing clinical practice guideline (CPG) recommendations for the surgical management of knee osteoarthritis and total knee arthroplasty (TKA). An observational cross-sectional design was conducted in September 2025. Twenty recommendations from the most recent American Academy of Orthopaedic Surgeons CPG on TKA were translated into structured clinical questions and submitted to ChatGPT-4o. Each query was entered three times in independent sessions to evaluate textual consistency. Two independent reviewers with expertise in musculoskeletal physiotherapy and orthopedics appraised the chatbot's answers, classifying them according to the CPG framework ("should do," "could do," "do not do," "uncertain"). Agreement between reviewers and alignment with CPG recommendations were assessed using Cohen's and Fleiss' Kappa coefficients. ChatGPT-4o achieved an overall concordance of 60% with the CPG recommendations, representing fair agreement (κ = 0.392, <i>p</i> = 0.005). Internal text consistency across repeated trials was low, with several responses showing unacceptable similarity levels (<50%). Inter-rater reliability ranged from moderate to perfect (κ = 0.547-0.946). Although ChatGPT-4o provided clinically acceptable answers in several domains, discrepancies persisted, particularly in recommendations regarding functional outcomes and rehabilitation strategies. ChatGPT-4o demonstrated moderate accuracy and heterogeneous reliability when reproducing CPG recommendations for TKA. While the model may serve as a supportive tool for education and patient communication, its variability and incomplete adherence to guidelines highlight the need for cautious integration and professional oversight in clinical decision-making.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.422 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.300 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.734 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.519 Zit.