Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Putting <scp>AI</scp> Chatbots to the Test: A Comparative Analysis of Large Language Models' Performance in the Context of Basal Cell Carcinoma
0
Zitationen
12
Autoren
2025
Jahr
Abstract
Large language models (LLMs) have been explored in various dermato-oncological conditions. In this study, we aimed to compare different LLMs' potential to guide clinicians on the treatment of basal cell carcinoma (BCC). Four authors formulated 24 questions on the topic of clinical management of BCC. The blinded responses of three LLMs (Gemini, Copilot and ChatGPT 4.0) were presented to a panel of nine dermato-oncologists for assessment of (i) factual accuracy, (ii) concision, (iii) comprehensiveness and (iv) overall preference. In addition, the responses were then quantitatively compared based on lexical (i.e., vocabulary) and semantic (i.e., meaning) similarity to three additional LLMs (ChatGPT 3.5, ChatGPT 4o and Claude). ChatGPT 4.0 had the highest accuracy rate (87.5%, i.e., 21/24 responses), followed by Gemini (50%) and Copilot (25%). All models scored lower for concision and comprehensiveness, with ChatGPT 4.0 in the lead (62.5% comprehensive; 54.2% concise), followed by Gemini (33.3%; 12.5%) and Copilot (16.7%; 8.3%). The panel achieved consensus on model preference in 16 questions (ChatGPT 4.0: 54.2%; Gemini: 8.3%; Copilot: 4.2%; no consensus: 33.3%). While the lexical similarity was found to be low (x̄ ~0.07-0.10 across models), the semantic similarity between the LLM responses was moderate (x̄ ~0.60-0.70 across models). LLMs may assist clinicians in settings where expert dermato-oncological guidance is not readily available, with ChatGPT 4.0 currently outperforming both Gemini and Copilot. Since quantitative methods are unable to detect clinically relevant differences between LLMs, surveying dermatologists is necessary to identify useful models in this rapidly developing field.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.