Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Can large language models reliably educate patients after kyphoplasty? A clinician-rated comparative study of ChatGPT and Gemini
0
Zitationen
6
Autoren
2026
Jahr
Abstract
Large language models, such as ChatGPT and Google Gemini, are becoming increasingly used in medicine for various purposes, ranging from medical education to research. Given the accessibility of consumer-facing models, patients may turn to them for answers to their medical questions. To compare outputs from ChatGPT and Google Gemini in response to common post-operative questions from patients after kyphoplasty. Thirteen common post-operative questions were compiled and asked to ChatGPT and Gemini. Five clinicians assessed the clinical accuracy and appropriateness of the responses using a 5-point Likert scale. Reviewers were blinded to model identity. Readability was evaluated by three raters using the Flesch-Kincaid grade level and a 3-point Likert scale. Matched-pair t-tests were used to compare responses from ChatGPT and Google Gemini, with statistical significance defined as a p-value < 0.05. ChatGPT responses were more accurate (p<0.001) and appropriate (p<0.01) compared to Gemini. ChatGPT's average Flesch-Kincaid grade level was 12.2, compared to 13.0 for Gemini (p = 0.05). On the 3-point Likert scale for readability, ChatGPT scored an average of 1.56/2, while Gemini scored 1.85/2 (p = 0.01). ChatGPT outperformed Gemini in terms of clinical accuracy and the appropriateness of responses. The results for readability were mixed, with the Flesch-Kincaid system indicating that ChatGPT generated responses at a higher grade level, while the Likert scale showed that Gemini’s responses were easier to read. While ChatGPT demonstrated better clinical accuracy and appropriateness, the use of LLM should not replace clinician-delivered postoperative counseling.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.611 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.504 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.025 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.835 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.