Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
EVALUATING LARGE LANGUAGE MODELS IN PATIENT EDUCATION: A COMPARATIVE ANALYSIS OF CHATGPT AND GOOGLE GEMINI IN ADDRESSING FREQUENTLY ASKED QUESTIONS IN PERIACETABULAR OSTEOTOMY
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Large language models (LLMs) are rapidly gaining traction as sources of information across various fields, including healthcare. As patients may increasingly turn to these models for health-related inquiries, it becomes necessary to assess the accuracy and reliability of their responses. This study evaluates the performance of two leading LLMs, ChatGPT (OpenAI) and Google Gemini (Google DeepMind), in addressing common patient questions on periacetabular osteotomy (PAO). PAO procedures are performed on younger patients, a demographic that is more likely to engage with digital tools like LLMs to seek health information. ChatGPT and Gemini were selected for their popularity and accessibility and evaluated for their ability to provide patient education through conversational, human-like responses. An expert panel of fellowship-trained PAO surgeons curated a set of 10 commonly posed patient questions to simulate real inquiries they face on a regular basis. Responses from each LLM were evaluated by three experienced consultants who were blinded to the source of each response, using a 5-point Likert scale to assess clarity, accuracy, and completeness. ChatGPT demonstrated a significant advantage, achieving an average score of 4.17 compared to Gemini's 3.13 (t = -3.08, p = 0.006). ChatGPT's responses were frequently rated higher for completeness and clarity, particularly in areas requiring detailed explanations, and were often deemed to need minimal clarification. In contrast, Gemini's responses, though generally accurate, occasionally exhibited vagueness or minor inaccuracies that impacted their perceived reliability. These results suggest that there may be significant variations in the ability of LLMs to provide effective support for patients using them for queries around surgery. The current version of ChatGPT more consistently met expert standards for clarity and thoroughness - as the use of LLMs continues to grow, ChatGPT could play an increasingly valuable role in educating patients about hip surgery, potentially enhancing preoperative consultations, supporting informed decision-making and providing post operative advice for procedures.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.