OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.05.2026, 06:28

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Can Large Language Models Translate Spine Surgery Guidelines for Patients? A Pilot Validation Study Using AO Spine Recommendations for Degenerative Cervical Myelopathy

2026·0 Zitationen·Clinical Spine Surgery A Spine Publication
Volltext beim Verlag öffnen

0

Zitationen

11

Autoren

2026

Jahr

Abstract

STUDY DESIGN: Pilot cross-sectional validation study. OBJECTIVES: To evaluate the accuracy of ChatGPT-5 in generating clinician- and patient-facing responses to the 5 AO Spine degenerative cervical myelopathy (DCM) guidelines. SUMMARY OF BACKGROUND DATA: Degenerative cervical myelopathy (DCM) is managed according to the AO Spine's treatment recommendations; however, these guidelines may be difficult for many patients to understand, who are increasingly seeking medical guidance from large language models like ChatGPT. METHODS: This pilot study queried ChatGPT-5 7with clinical scenarios (severe, moderate, and mild myelopathy; nonmyelopathic cord compression with and without radiculopathy) extracted directly from the 5 AO Spine DCM guidelines, once for clinicians and once for patients at a ≤6th-grade reading level, using validated prompt templates. Outputs were independently graded by 3 fellowship-trained spine surgeons using a 3-point Likert scale (3=accurate, 2=partially accurate, and 1=inaccurate). The accuracy of ChatGPT's outputs was analyzed, as was the readability of patient-facing outputs using SMOG, FKGL, and FRE indices. RESULTS: Clinician-facing outputs were highly accurate (mean 2.93±0.26) with near-perfect agreement and no outputs graded as inaccurate. Patient-facing responses were less consistent (mean 2.33±0.72; range: 1-3) with no full inter-rater agreement. Lower scores reflected omission of key details such as mJOA thresholds and clear distinctions between surgical and conservative care. Readability analyses outputted a mean SMOG of 10.4±1.1, FKGL 6.9±0.7, and FRE 72.4±5.3, indicating readability at a ∼7th-grade level, but within "fairly easy" comprehension. CONCLUSIONS: ChatGPT-5 reliably reproduced AO Spine's guideline recommendations for DCM in clinician-facing outputs. Patient-facing answers were generally aligned with the guidelines, but important details were sometimes lost, and readability was slightly above AMA recommendations. Spine expert review remains essential to ensure accurate patient education and adherence to guidelines.

Ähnliche Arbeiten