Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini
40
Zitationen
5
Autoren
2024
Jahr
Abstract
<i>Background and Objectives:</i> Large language models (LLMs) are emerging as valuable tools in plastic surgery, potentially reducing surgeons' cognitive loads and improving patients' outcomes. This study aimed to assess and compare the current state of the two most common and readily available LLMs, Open AI's ChatGPT-4 and Google's Gemini Pro (1.0 Pro), in providing intraoperative decision support in plastic and reconstructive surgery procedures. <i>Materials and Methods:</i> We presented each LLM with 32 independent intraoperative scenarios spanning 5 procedures. We utilized a 5-point and a 3-point Likert scale for medical accuracy and relevance, respectively. We determined the readability of the responses using the Flesch-Kincaid Grade Level (FKGL) and Flesch Reading Ease (FRE) score. Additionally, we measured the models' response time. We compared the performance using the Mann-Whitney U test and Student's t-test. <i>Results:</i> ChatGPT-4 significantly outperformed Gemini in providing accurate (3.59 ± 0.84 vs. 3.13 ± 0.83, <i>p</i>-value = 0.022) and relevant (2.28 ± 0.77 vs. 1.88 ± 0.83, <i>p</i>-value = 0.032) responses. Alternatively, Gemini provided more concise and readable responses, with an average FKGL (12.80 ± 1.56) significantly lower than ChatGPT-4's (15.00 ± 1.89) (<i>p</i> < 0.0001). However, there was no difference in the FRE scores (<i>p</i> = 0.174). Moreover, Gemini's average response time was significantly faster (8.15 ± 1.42 s) than ChatGPT'-4's (13.70 ± 2.87 s) (<i>p</i> < 0.0001). <i>Conclusions:</i> Although ChatGPT-4 provided more accurate and relevant responses, both models demonstrated potential as intraoperative tools. Nevertheless, their performance inconsistency across the different procedures underscores the need for further training and optimization to ensure their reliability as intraoperative decision-support tools.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.214 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.071 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.429 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.418 Zit.