Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Solving Complex Pediatric Surgical Case Studies: A Comparative Analysis of Copilot, ChatGPT-4, and Experienced Pediatric Surgeons' Performance
4
Zitationen
14
Autoren
2025
Jahr
Abstract
The emergence of large language models (LLMs) has led to notable advancements across multiple sectors, including medicine. Yet, their effect in pediatric surgery remains largely unexplored. This study aims to assess the ability of the artificial intelligence (AI) models ChatGPT-4 and Microsoft Copilot to propose diagnostic procedures, primary and differential diagnoses, as well as answer clinical questions using complex clinical case vignettes of classic pediatric surgical diseases.We conducted the study in April 2024. We evaluated the performance of LLMs using 13 complex clinical case vignettes of pediatric surgical diseases and compared responses to a human cohort of experienced pediatric surgeons. Additionally, pediatric surgeons rated the diagnostic recommendations of LLMs for completeness and accuracy. To determine differences in performance, we performed statistical analyses.ChatGPT-4 achieved a higher test score (52.1%) compared to Copilot (47.9%) but less than pediatric surgeons (68.8%). Overall differences in performance between ChatGPT-4, Copilot, and pediatric surgeons were found to be statistically significant (<i>p</i> < 0.01). ChatGPT-4 demonstrated superior performance in generating differential diagnoses compared to Copilot (<i>p</i> < 0.05). No statistically significant differences were found between the AI models regarding suggestions for diagnostics and primary diagnosis. Overall, the recommendations of LLMs were rated as average by pediatric surgeons.This study reveals significant limitations in the performance of AI models in pediatric surgery. Although LLMs exhibit potential across various areas, their reliability and accuracy in handling clinical decision-making tasks is limited. Further research is needed to improve AI capabilities and establish its usefulness in the clinical setting.
Ähnliche Arbeiten
Classification of Surgical Complications
2004 · 30.213 Zit.
2013 ESH/ESC Guidelines for the management of arterial hypertension
2013 · 13.648 Zit.
CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials
2010 · 13.434 Zit.
Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure
2003 · 13.233 Zit.
2013 ACCF/AHA Guideline for the Management of Heart Failure
2013 · 12.583 Zit.
Autoren
Institutionen
- Leipzig University(DE)
- Essen University Hospital(DE)
- University Medical Centre Mannheim(DE)
- Klinikum Bremen-Mitte(DE)
- University Medical Center Hamburg-Eppendorf(DE)
- Universität Hamburg(DE)
- Universitätsklinikum Gießen und Marburg(DE)
- University of Helsinki(FI)
- Helsinki Children's Hospital(FI)
- Karolinska University Hospital(SE)
- Karolinska Institutet(SE)
- Washington University in St. Louis(US)
- Sapienza University of Rome(IT)
- Hospital for Sick Children(CA)