Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Gonarthrosis Advisor vs ChatGPT-5: quality and readability of artificial intelligence–generated patient education for knee osteoarthritis

2026·0 Zitationen·Acta Orthopaedica et Traumatologica TurcicaOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Objective: The primary objective of this study is to compare the quality and readability of patient education materials generated by a general-purpose large language model (ChatGPT-5) versus a guideline-based, fine-tuned model (the Gonarthrosis Advisor). The study aims to quantify the performance gains achieved through domain-specific, fine-tuning, and reinforcement learning using osteoarthritis clinical guidelines. Methods: Thirty frequently asked patient questions regarding knee osteoarthritis were compiled from Google’s “People Also Ask” feature and outpatient clinical observations in May 2025. Responses were generated in Turkish by both the Gonarthrosis Advisor and ChatGPT-5 to reflect real-world patient education materials. Content quality was assessed by 2 independent orthopedic surgeons who were blinded to model identity to minimize bias. Both reviewers were co-authors of the article yet did not participate in model construction or data analysis. The assessments utilized the DISCERN instrument, a validated 16-item measure for evaluating the reliability and quality of treatment-related information. Readability was analyzed using the Flesch-Kincaid Grade Level (FKGL), Flesch Reading Ease Score (FRES), and Turkish specific indices (Ateşman and Çakır-Demir). For comparability, English-based indices were applied to translated versions of the responses, whereas Turkish indices were applied to the original texts. All responses were anonymized and randomized prior to evaluation. Model identifiers were removed, and each response was presented in a standardized format to ensure blinding of reviewers. Inter-rater reliability was measured using Cronbach’s α. Normality assumptions were tested, and Wilcoxon signed-rank tests were used for statistical comparisons. As no human subjects or personal data were involved, ethical approval was not required. Results: Mean DISCERN scores corresponded to the “good” category (66.4) for the Gonarthrosis Advisor and the ‘moderate’ category (54.2) for ChatGPT-5, according to established cut-off thresholds. The Gonarthrosis Advisor achieved significantly higher DISCERN scores than ChatGPT-5 (66.4 ± 4.8 vs. 54.2 ± 5.6; P < .001) with high inter-rater reliability (Cronbach’s α = 0.86). Readability metrics favored the Gonarthrosis Advisor across all indices: lower FKGL (7.8 ± 0.7 vs. 9.6 ± 0.9) and higher FRES (54.3 ± 3.4 vs. 46.7 ± 3.7), Ateşman (92.0 ± 4.2 vs. 84.3 ± 4.9), and Çakır-Demir (111.7 ± 5.1 vs. 106.9 ± 5.4) scores (all P <.001). Conclusion: Fine-tuning large language models with guideline-based content and reinforcement learning improves the quality, neutrality, and accessibility of artificial intelligence–generated patient education materials, offering a scalable tool to enhance health literacy and support shared decision-making in knee osteoarthritis care. Cite this article as: Öner SK, Demirkiran ND, Canlı EA, Bilir A. Gonarthrosis Advisor vs ChatGPT-5: quality and readability of AI–generated patient education for knee osteoarthritis. Acta Orthop Traumatol Turc., 2026, 60(2), 0618, doi: 10.5152/j.aott.2026.25618.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationHealth Literacy and Information AccessibilitySocial Media in Health Education

Volltext beim Verlag öffnen

Gonarthrosis Advisor vs ChatGPT-5: quality and readability of artificial intelligence–generated patient education for knee osteoarthritis

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen