Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Agreement of ChatGPT with Clinical Practice Guidelines for Knee Osteoarthritis Treatment
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Purpose: This cross-sectional study evaluated whether Chat Generative Pre-Trained Transformer (ChatGPT) GPT-3.5 and GPT-4O provide treatment information consistent with high-quality clinical practice guidelines (CPGs) for knee osteoarthritis (OA). Methods: High-quality CPGs published in the past decade were identified via PubMed and PEDro, with the search updated on November 11, 2024. Guidelines were appraised using the AGREE II tool. GPT-3.5 and GPT-4O were queried with common treatment-related questions, and their responses were compared to CPG recommendations. Two independent reviewers conducted a thematic content analysis of GPT-3.5 and GPT-4O responses using a deductive–inductive codebook, iteratively refined through consensus, to identify major themes/subthemes, and their frequencies. Consistency between ChatGPT outputs and CPGs was categorized as high agreement (4/4 “yes” responses), moderate (3/4 “yes” responses), or low agreement (2/4 or fewer “yes” responses or 2/4 not reported). Study selection and data extraction were performed independently by two reviewers, with a third reviewer consulted when necessary. Inter-reviewer agreement for data extraction, assessing alignment between ChatGPT and CPGs, was evaluated using the percentage agreement to measure consistency in data identification and categorization. Results: Four high-quality guidelines were identified and analyzed. GPT-3.5 and GPT-4o generated 14 and 10 questions, respectively, yielding 10 themes and 33 subthemes. The 10 themes included: exercise and physical therapy, lifestyle modifications, medications, supplements and herbal remedies, assistive devices, additional therapies, education, consultation, intraarticular injections, and surgical and advanced treatments. Among the subthemes, 6.06% demonstrated high agreement (e.g., low-impact aerobic exercises, patient education), 18.18% moderate agreement (e.g., strengthening, weight management), and 75.75% low agreement across interventions. Excluding the physical therapy-specific CPG increased agreement for treatments such as analgesics, NSAIDs, glucosamine and chondroitin, and intra-articular corticosteroid injections. Conclusion: While GPT-3.5 and GPT-4O demonstrated high or moderate agreement with CPGs for certain themes, most subthemes showed low agreement. A separate analysis excluding a CPG developed specifically for physical therapists modestly improved agreement levels but did not alter the overall pattern. These findings highlight the need for ongoing refinement of AI tools and underscore the importance of clinicians critically evaluating AI-generated content, particularly as patients may increasingly rely on such tools to guide self-management decisions.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.578 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.470 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.984 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.814 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.