Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Responses of Artificial Intelligence to Questions About Urological Emergencies: A Comparison of 3 Different Large Language Models

2025·0 Zitationen·The New Journal of UrologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Objective: This study aimed to compare the accuracy and adequacy of responses provided by three different large language models (LLMs) utilizing artificial intelligence technology to fundamental questions related to urological emergencies. Material and Methods: Nine distinct urological emergency topics were identified, and a total of 63 fundamental questions were formulated for each topic, including two related to diagnosis, three related to disease management, and two related to complications. The questions were posed in English on three different free AI platforms (ChatGPT-4, Google Gemini 2.0 Flash, and Meta Llama 3.2), each utilizing different infrastructures, and responses were documented. The answers were scored by the authors on a scale of 1 to 4 based on accuracy and adequacy, and the results were compared using statistical analysis. Results: When all question-answer pairs were evaluated overall, ChatGPT exhibited slightly higher accuracy rates compared to Gemini and Meta Llama; however, no statistically significant differences were detected among the groups (3.8 ± 0.5, 3.7 ± 0.6, and 3.7 ± 0.5, respectively; p=0.146). When questions related to diagnosis, treatment management, and complications were evaluated separately, no statistically significant differences were detected among the three LLMs (p=0.338, p=0.289, and p=0.407, respectively). Only one response provided by Gemini was found to be completely incorrect (1.6%). No misleading or wrong answers were observed in the diagnosis-related questions across all three platforms. In total, misleading answers were observed in 2 questions (3.2%) for ChatGPT, three questions (4.7%) for Gemini, and two questions (3.2%) for Meta Llama. Conclusion: LLMs predominantly provide accurate results to basic and straightforward questions related to urological emergencies, where prompt treatment is critical. Although no significant differences were observed among the responses of the three LLMs compared in this study, the presence of misleading and incorrect answers should be carefully considered, given the evolving nature and limitations of this technology. Keywords: urological emergencies, artificial intelligence, large language models

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsAutopsy Techniques and Outcomes

Volltext beim Verlag öffnen

The Responses of Artificial Intelligence to Questions About Urological Emergencies: A Comparison of 3 Different Large Language Models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen