OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 30.03.2026, 18:33

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

EVALUATING LARGE LANGUAGE MODELS IN PATIENT EDUCATION: A COMPARATIVE ANALYSIS OF CHATGPT AND GOOGLE GEMINI IN ADDRESSING FREQUENTLY ASKED QUESTIONS IN PERIACETABULAR OSTEOTOMY

2025·0 Zitationen·Orthopaedic Proceedings
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

Large language models (LLMs) are rapidly gaining traction as sources of information across various fields, including healthcare. As patients may increasingly turn to these models for health-related inquiries, it becomes necessary to assess the accuracy and reliability of their responses. This study evaluates the performance of two leading LLMs, ChatGPT (OpenAI) and Google Gemini (Google DeepMind), in addressing common patient questions on periacetabular osteotomy (PAO). PAO procedures are performed on younger patients, a demographic that is more likely to engage with digital tools like LLMs to seek health information. ChatGPT and Gemini were selected for their popularity and accessibility and evaluated for their ability to provide patient education through conversational, human-like responses. An expert panel of fellowship-trained PAO surgeons curated a set of 10 commonly posed patient questions to simulate real inquiries they face on a regular basis. Responses from each LLM were evaluated by three experienced consultants who were blinded to the source of each response, using a 5-point Likert scale to assess clarity, accuracy, and completeness. ChatGPT demonstrated a significant advantage, achieving an average score of 4.17 compared to Gemini's 3.13 (t = -3.08, p = 0.006). ChatGPT's responses were frequently rated higher for completeness and clarity, particularly in areas requiring detailed explanations, and were often deemed to need minimal clarification. In contrast, Gemini's responses, though generally accurate, occasionally exhibited vagueness or minor inaccuracies that impacted their perceived reliability. These results suggest that there may be significant variations in the ability of LLMs to provide effective support for patients using them for queries around surgery. The current version of ChatGPT more consistently met expert standards for clarity and thoroughness - as the use of LLMs continues to grow, ChatGPT could play an increasingly valuable role in educating patients about hip surgery, potentially enhancing preoperative consultations, supporting informed decision-making and providing post operative advice for procedures.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical Imaging
Volltext beim Verlag öffnen