OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.03.2026, 06:44

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Retrospective Comparison of Artificial Intelligence and the Orthopaedic Multi-disciplinary Team in the Management of Intracapsular Neck of Femur Fractures

2025·0 Zitationen·CureusOpen Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2025

Jahr

Abstract

Introduction Artificial intelligence (AI) tools, such as ChatGPT, could potentially support junior clinicians in making initial operative decisions for hip fractures. However, the safety and reliability of such use are uncertain. This study compared ChatGPT's management recommendations for patients with intracapsular neck of femur (NOF) fractures to decisions made by orthopaedic consultants and evaluated which patient factors influenced those recommendations. Methods We identified a retrospective cohort of patients admitted with an intracapsular NOF fracture over an 18-week period to a United Kingdom District General Hospital. We collected patients' age, sex, comorbidities, mobility status, and the 4 A's Test (4AT) score. De-identified data were entered into ChatGPT with instructions to recommend management based on National Institute for Health and Care Excellence guidance; ChatGPT's recommendations were compared with the operation recorded at the Trauma Meeting. When ChatGPT and consultants disagreed, we provided ChatGPT with the consultant's decision and rationale and recorded the revised recommendation. We then validated ChatGPT on a separate anonymized patient set. We used Cohen's Kappa to assess agreement and applied multinomial and binomial logistic regression and two-proportion z-tests to assess significance. Results One hundred five patients were included in the primary cohort, and 30 in the validation cohort. Initial agreement between ChatGPT and consultants was low [κ = 0.03, 95% confidence interval (CI) - 0.11 to 0.19, p = 0.70]. After in-session adjustment, agreement rose (κ = 0.93, 95% CI 0.84 - 1.00, p < 0.001), a statistically significant improvement (z = 10.3, p < 0.001). The only post-adjustment factor that significantly influenced ChatGPT's recommendations was age, showing that increasing age was associated with a reduced likelihood of receiving total hip replacement rather than hemiarthroplasty (odds ratio = 0.81, 95% CI 0.70-0.94; p = 0.006). In the validation cohort, agreement fell (κ = 0.29, 95% CI -0.04 to 0.60, p = 0.06) and did not significantly differ from the initial agreement (z = -1.51, p = 0.13). Conclusion ChatGPT did not reliably replicate orthopaedic consultant decision-making for intracapsular NOF fractures. Although in-session adjustments produced high superficial concordance, this effect did not generalize to an independent dataset. The model's tendency to conform to user prompts risks creating false confidence in its outputs. Clinically focused, validated AI systems require further evaluation before they can safely augment operative decision-making.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Hip and Femur FracturesMedical Imaging and AnalysisArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen