Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
ChatGPT and Gemini Are Not Consistently Concordant With the 2020 American Academy of Orthopaedic Surgeons Clinical Practice Guidelines When Evaluating Rotator Cuff Injury
4
Zitationen
8
Autoren
2025
Jahr
Abstract
PURPOSE: To evaluate the accuracy of suggestions given by ChatGPT and Gemini (previously known as "Bard"), 2 widely used publicly available large language models, to evaluate the management of rotator cuff injuries. METHODS: The 2020 American Academy of Orthopaedic Surgeons (AAOS) Clinical Practice Guidelines (CPGs) were the basis for determining recommended and non-recommended treatments in this study. ChatGPT and Gemini were queried on 16 treatments based on these guidelines examining rotator cuff interventions. The responses were categorized as "concordant" or "discordant" with the AAOS CPGs. The Cohen κ coefficient was calculated to assess inter-rater reliability. RESULTS: ChatGPT and Gemini showed concordance with the AAOS CPGs for 13 of the 16 treatments queried (81%) and 12 of the 16 treatments queried (75%), respectively. ChatGPT provided discordant responses with the AAOS CPGs for 3 treatments (19%), whereas Gemini provided discordant responses for 4 treatments (25%). Assessment of inter-rater reliability showed a Cohen κ coefficient of 0.98, signifying agreement between the raters in classifying the responses of ChatGPT and Gemini to the AAOS CPGs as being concordant or discordant. CONCLUSIONS: ChatGPT and Gemini do not consistently provide responses that align with the AAOS CPGs. CLINICAL RELEVANCE: This study provides evidence that cautions patients not to rely solely on artificial intelligence for recommendations about rotator cuff injuries.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.549 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.443 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.941 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.