Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Putting <scp>AI</scp> Chatbots to the Test: A Comparative Analysis of Large Language Models' Performance in the Context of Basal Cell Carcinoma

2025·0 Zitationen·Experimental DermatologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large language models (LLMs) have been explored in various dermato-oncological conditions. In this study, we aimed to compare different LLMs' potential to guide clinicians on the treatment of basal cell carcinoma (BCC). Four authors formulated 24 questions on the topic of clinical management of BCC. The blinded responses of three LLMs (Gemini, Copilot and ChatGPT 4.0) were presented to a panel of nine dermato-oncologists for assessment of (i) factual accuracy, (ii) concision, (iii) comprehensiveness and (iv) overall preference. In addition, the responses were then quantitatively compared based on lexical (i.e., vocabulary) and semantic (i.e., meaning) similarity to three additional LLMs (ChatGPT 3.5, ChatGPT 4o and Claude). ChatGPT 4.0 had the highest accuracy rate (87.5%, i.e., 21/24 responses), followed by Gemini (50%) and Copilot (25%). All models scored lower for concision and comprehensiveness, with ChatGPT 4.0 in the lead (62.5% comprehensive; 54.2% concise), followed by Gemini (33.3%; 12.5%) and Copilot (16.7%; 8.3%). The panel achieved consensus on model preference in 16 questions (ChatGPT 4.0: 54.2%; Gemini: 8.3%; Copilot: 4.2%; no consensus: 33.3%). While the lexical similarity was found to be low (x̄ ~0.07-0.10 across models), the semantic similarity between the LLM responses was moderate (x̄ ~0.60-0.70 across models). LLMs may assist clinicians in settings where expert dermato-oncological guidance is not readily available, with ChatGPT 4.0 currently outperforming both Gemini and Copilot. Since quantitative methods are unable to detect clinically relevant differences between LLMs, surveying dermatologists is necessary to identify useful models in this rapidly developing field.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationCutaneous Melanoma Detection and ManagementAI in cancer detection

Volltext beim Verlag öffnen

Putting <scp>AI</scp> Chatbots to the Test: A Comparative Analysis of Large Language Models' Performance in the Context of Basal Cell Carcinoma

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen