Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial Intelligence Provides Helpful Information to Patients Undergoing ACLR, But Patients May Not Fully Understand The Answers
0
Zitationen
7
Autoren
2026
Jahr
Abstract
Background With the increasing integration of medicine and technology, evaluating the accuracy of online resources and newly introduced artificial intelligence (AI) platforms is critical. Despite the numerous chatbot systems available, there is little information regarding their reliability in providing accurate information about common orthopaedic procedures, such as anterior cruciate ligament reconstruction (ACLR). Purpose/Hypothesis: This study aims to compare the accuracy and readability of information provided by OpenAI’s ChatGPT and Google’s Gemini in response to commonly asked patient questions prior to ACLR. The authors hypothesize that AI systems will be able to provide useful information, but the information will not be suited for patient Methods Authors compiled a list of common questions directed towards orthopaedic surgeons prior to ACLR and inputted the questions into ChatGPT 3.5 and Google’s Gemini. Using a four-point scale ranging from one being an excellent response to four being an unsatisfactory response needing major clarification, a group of board-certified orthopaedic surgeons independently graded each system’s answers. Each surgeon then provided an overall answer as to whether they felt the chat system’s response would be helpful to patients. Additionally, each response was evaluated for its readability using the Flesch-Kincaid Reading Ease and Flesch-Kincaid Grade Level. Results Both ChatGPT and Gemini provided responses to all 13 questions, each including disclaimers recommending consultation with medical professionals. Surgeon grading demonstrated that ChatGPT responses received average scores ranging from 1.15 to 2.23, while Gemini responses ranged from 1.31 to 3.15. Significant variability was noted in the scores provided to ChatGPT responses across each question (F = 4.13, p = 0.00016) and among surgeons (F = 3.18, p = 0.0196), whereas Gemini scores were more consistent across the responses to each question but again showed variability between surgeons (F = 13.11, p < 0.001). Readability analysis favored Gemini, with a Flesch-Kincaid Ease score of 43.27 compared to 23.5 with ChatGPT (p < 0.001). The average grade level required to understand the response provided by ChatGPT was 12, while the average grade level required to understand Gemini’s response was 11.06 (p<0.001). Conclusion ChatGPT and Gemini were both able to provide adequate, helpful responses to patient questions with respect to ACLR, as graded by board-certified orthopaedic surgeons. Despite this the responses required a reading comprehension level well above the average level for an American which is 7th/8th grade level, raising concerns regarding patient comprehension. It is essential for the medical community to continue critically evaluating AI-generated information to ensure it is accurate, reliable, and easily understandable for patients.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.