Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Comparative Analysis of GPT-3.5, GPT-4, GPT-4 Omni, Gemini Advanced, and Gemini 1.5 in Answering Total Knee Replacement−Related Questions
1
Zitationen
7
Autoren
2026
Jahr
Abstract
Background: Artificial intelligence (AI) chatbots are increasingly used for medical information provision. However, systematic evaluations of their accuracy and reliability in orthopaedic surgery, particularly in total knee replacement (TKR), remain limited. Purpose: To systematically compare and evaluate performances of various AI chatbots, focusing on their ability to provide accurate and reliable information related to TKR. Study Design: Cohort study; Level of evidence, 2. Methods: A total of 43 clinically relevant TKR-related frequently asked questions (FAQs) were selected based on Google search trends and expert consultation. Questions were categorized into 6 key domains: (1) general/procedure-related information, (2) indications and outcomes, (3) risks and complications, (4) pain and postoperative recovery, (5) specific activities after surgery, and (6) alternatives and variations. Each question was submitted to 5 different chatbot models (GPT-3.5, GPT-4, GPT-4 Omni, Gemini Advanced, and Gemini 1.5) for response generation. Two independent orthopaedic surgeons assessed the chatbot's responses for both accuracy and relevance using a 5-point Likert scale. Responses were anonymized, blinding evaluators to the chatbot identities to prevent bias. Accuracy differences among the chatbot models were analyzed by analysis of variance, and relevance was compared using the Kruskal-Wallis test. Results: < .001). Conclusion: This study demonstrates that GPT-3.5, GPT-4, GPT-4 Omni, and Gemini 1.5 can provide highly accurate and relevant responses to TKR-related queries, while Gemini Advanced underperforms.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.560 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.451 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.948 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.797 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.