Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the accuracy and patient perception of AI-generated answers on pectus surgery
0
Zitationen
4
Autoren
2025
Jahr
Abstract
This article aims to evaluate the quality of responses provided by AI systems ChatGPT-4o and Gemini 1.5 to frequently asked patient questions regarding pectus deformity. METHOD: In this cross-sectional survey study, 36 frequently asked questions about pectus surgery were posed using a new Google account with no search history. The responses to the questions were recorded using the ChatGPT-4o and Gemini 1.5 AI programs. The responses were rated by 10 surgeons specialized in pectus surgery using a rating scale for relevance, accuracy, clarity, and completeness. The intraclass correlation coefficient (ICC) was used for interrater reliability (IRR) analysis of the evaluators' responses. RESULTS: In the study, the average score for relevance was 4.79 for ChatGPT-4o and 4.86 for Gemini 1.5. According to the accuracy criterion, ChatGPT-4o had an average of 4.59, while Gemini 1.5 had 4.64. Regarding clarity, ChatGPT-4o scored an average of 4.61, and Gemini1.5 scored 4.75. No statistically significant difference was found between the responses of the two models in terms of relevance, accuracy, clarity, and completeness. Additionally, no statistical difference was found in the IRR analyses during the evaluation process. The reliability of the ChatGPT-4o and Gemini 1.5 language models was compared based on various criteria. CONCLUSION: As a result of this study, we believe that AI applications have a high potential in informing patients about pectus surgery. However, they cannot replace professional medical advice, and therefore, we believe that patients should consult experts to verify AI responses.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.644 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.550 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.061 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.850 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.