Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

AI vs AI: clinical reasoning performance of language models in orthopedic rehabilitation

2025·1 Zitationen·Journal of Health Sciences and MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Aims: This study aimed to compare the clinical reasoning and treatment planning performance of three advanced large language models (LLMs)-ChatGPT-4o, Gemini 2.5 Pro, and DeepSeek-V3-in orthopedic rehabilitation. Their responses to standardized clinical scenarios were evaluated to determine alignment with evidence‑based physiotherapy practices, focusing on relevance, accuracy, completeness, applicability, and safety awareness. Methods: Three fictional but clinically realistic scenarios involving rotator cuff tendinopathy, lumbar disc herniation with radiculopathy, and anterior cruciate ligament (ACL) reconstruction were developed by an experienced physiotherapist. These scenarios were independently queried on the same day by three AI models using identical prompts. A blinded expert physiotherapist evaluated each model’s detailed responses using a 5-point Likert Scale across five domains: clinical accuracy, relevance, completeness, applicability, and safety awareness. Mean scores and descriptive statistics were calculated. Results: DeepSeek-V3 was consistently rated highest (5/5) across all domains and scenarios, demonstrating comprehensive and clinically rigorous plans. ChatGPT-4o showed strong performance overall, with total scores ranging from 19 to 20 out of 25, though it exhibited lower completeness scores due to less specific milestones. Gemini 2.5 Pro scored lower overall (average total score 18/25), with particular weaknesses in applicability and clinical relevance in complex cases such as lumbar disc herniation. All models provided evidence-based treatment approaches emphasizing pain management, postural correction, gradual strengthening, and return-to-activity progression. Differences arose in emphasis on lifestyle modification, patient education depth, and integration of psychosocial factors, with Gemini uniquely addressing psychological readiness in ACL rehabilitation. Conclusion: AI-generated rehabilitation plans show substantial concordance with current physiotherapy guidelines but vary in detail and clinical practicality. DeepSeek-V3 outperformed the other models in consistency and safety considerations, while ChatGPT-4o balanced clinical accuracy with moderate completeness. Gemini 2.5 Pro’s inclusion of biopsychosocial components offers valuable insights but may require further refinement for clinical applicability. These findings highlight the potential and current limitations of AI tools in orthopedic rehabilitation, suggesting careful model selection based on clinical context and user needs.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsMeta-analysis and systematic reviews

Volltext beim Verlag öffnen

AI vs AI: clinical reasoning performance of language models in orthopedic rehabilitation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen