Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Comparative Analysis of GPT-3.5, GPT-4, GPT–4 Omni, Gemini Advanced, and Gemini 1.5 in Answering Frequently Asked Questions Regarding High Tibial Osteotomy
0
Zitationen
6
Autoren
2025
Jahr
Abstract
Background: Large language model (LLM)-based chatbots, such as ChatGPT and Gemini, have become widely used sources of medical information. No study has assessed the performance of LLM chatbots in providing clinically reliable information on high tibial osteotomy (HTO). Purpose: To evaluate the accuracy and relevance of different LLM chatbots in responding to frequently asked questions (FAQs) about HTO. Study Design: Cross-sectional study. Methods: A total of 35 FAQs about HTO were curated from online sources and categorized into 6 categories: general/procedure related, indications for surgery and outcomes, risks and complications of surgery, pain and postoperative recovery, specific activities after surgery, and alternatives to and variations of HTO. These questions were used as input to 5 different LLM chatbots: ChatGPT-3.5, ChatGPT-4, ChatGPT-4 Omni, Gemini Advanced and Gemini 1.5. Responses were collected from July 12 to 14, 2024 (ChatGPT-3.5, ChatGPT-4, ChatGPT-4 Omni, and Gemini Advanced) and on September 26, 2024 (Gemini 1.5). Two independent orthopaedic surgeons assessed the responses using a 5-point Likert scale (1 = very incorrect/very irrelevant, 5 = very accurate/very relevant). Responses were anonymized to blind evaluators to chatbot identities. Differences in accuracy among chatbots were assessed using analysis of variance, and differences in relevance using the Kruskal-Wallis test. Results: = .09). All models provided relevant answers to all questions (35/35; 100%), except for Gemini Advanced (30/35; 85.7%). Conclusion: This study showed that ChatGPT-3.5, ChatGPT-4, ChatGPT-4 Omni, and Gemini 1.5 provided accurate and relevant responses on HTO, whereas Gemini Advanced exhibited limitations and underperformed in comparison with the other models.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.557 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.447 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.944 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.797 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.