Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Can Large Language Models Translate Spine Surgery Guidelines for Patients? A Pilot Validation Study Using AO Spine Recommendations for Degenerative Cervical Myelopathy
0
Zitationen
11
Autoren
2026
Jahr
Abstract
STUDY DESIGN: Pilot cross-sectional validation study. OBJECTIVES: To evaluate the accuracy of ChatGPT-5 in generating clinician- and patient-facing responses to the 5 AO Spine degenerative cervical myelopathy (DCM) guidelines. SUMMARY OF BACKGROUND DATA: Degenerative cervical myelopathy (DCM) is managed according to the AO Spine's treatment recommendations; however, these guidelines may be difficult for many patients to understand, who are increasingly seeking medical guidance from large language models like ChatGPT. METHODS: This pilot study queried ChatGPT-5 7with clinical scenarios (severe, moderate, and mild myelopathy; nonmyelopathic cord compression with and without radiculopathy) extracted directly from the 5 AO Spine DCM guidelines, once for clinicians and once for patients at a ≤6th-grade reading level, using validated prompt templates. Outputs were independently graded by 3 fellowship-trained spine surgeons using a 3-point Likert scale (3=accurate, 2=partially accurate, and 1=inaccurate). The accuracy of ChatGPT's outputs was analyzed, as was the readability of patient-facing outputs using SMOG, FKGL, and FRE indices. RESULTS: Clinician-facing outputs were highly accurate (mean 2.93±0.26) with near-perfect agreement and no outputs graded as inaccurate. Patient-facing responses were less consistent (mean 2.33±0.72; range: 1-3) with no full inter-rater agreement. Lower scores reflected omission of key details such as mJOA thresholds and clear distinctions between surgical and conservative care. Readability analyses outputted a mean SMOG of 10.4±1.1, FKGL 6.9±0.7, and FRE 72.4±5.3, indicating readability at a ∼7th-grade level, but within "fairly easy" comprehension. CONCLUSIONS: ChatGPT-5 reliably reproduced AO Spine's guideline recommendations for DCM in clinician-facing outputs. Patient-facing answers were generally aligned with the guidelines, but important details were sometimes lost, and readability was slightly above AMA recommendations. Spine expert review remains essential to ensure accurate patient education and adherence to guidelines.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.687 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.591 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.114 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.867 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.