OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 03.05.2026, 03:03

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Comparative Analysis of GPT-3.5, GPT-4, GPT–4 Omni, Gemini Advanced, and Gemini 1.5 in Answering Frequently Asked Questions Regarding High Tibial Osteotomy

2025·0 Zitationen·Orthopaedic Journal of Sports MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2025

Jahr

Abstract

Background: Large language model (LLM)-based chatbots, such as ChatGPT and Gemini, have become widely used sources of medical information. No study has assessed the performance of LLM chatbots in providing clinically reliable information on high tibial osteotomy (HTO). Purpose: To evaluate the accuracy and relevance of different LLM chatbots in responding to frequently asked questions (FAQs) about HTO. Study Design: Cross-sectional study. Methods: A total of 35 FAQs about HTO were curated from online sources and categorized into 6 categories: general/procedure related, indications for surgery and outcomes, risks and complications of surgery, pain and postoperative recovery, specific activities after surgery, and alternatives to and variations of HTO. These questions were used as input to 5 different LLM chatbots: ChatGPT-3.5, ChatGPT-4, ChatGPT-4 Omni, Gemini Advanced and Gemini 1.5. Responses were collected from July 12 to 14, 2024 (ChatGPT-3.5, ChatGPT-4, ChatGPT-4 Omni, and Gemini Advanced) and on September 26, 2024 (Gemini 1.5). Two independent orthopaedic surgeons assessed the responses using a 5-point Likert scale (1 = very incorrect/very irrelevant, 5 = very accurate/very relevant). Responses were anonymized to blind evaluators to chatbot identities. Differences in accuracy among chatbots were assessed using analysis of variance, and differences in relevance using the Kruskal-Wallis test. Results: = .09). All models provided relevant answers to all questions (35/35; 100%), except for Gemini Advanced (30/35; 85.7%). Conclusion: This study showed that ChatGPT-3.5, ChatGPT-4, ChatGPT-4 Omni, and Gemini 1.5 provided accurate and relevant responses on HTO, whereas Gemini Advanced exhibited limitations and underperformed in comparison with the other models.

Ähnliche Arbeiten