OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 30.03.2026, 17:38

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating Artificial Intelligence Conversational Platforms for Parental Queries on Antenatal Hydronephrosis: A Comparative Blinded Assessment

2026·0 Zitationen·Journal of Indian Association of Pediatric SurgeonsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2026

Jahr

Abstract

A BSTRACT Background: Antenatal hydronephrosis is the most common fetal anomaly detected on routine ultrasound. Parents often seek immediate information online and turn to artificial intelligence (AI) conversational platforms, which are easily available on mobile. The accuracy and reliability of these responses in sensitive pediatric surgical contexts are unknown. Objectives: To compare the quality of responses generated by ChatGPT, Gemini, and Claude to standardized parent-relevant questions on antenatal hydronephrosis. Materials and Methods: Five key questions were developed and vetted by three independent pediatric surgeons. To minimize bias, an independent person with a nonmedical background posed these questions to ChatGPT, Gemini, and Claude using a standardized background scenario. Responses were documented verbatim. Three pediatric surgeons then independently assessed each response for veracity, clarity, comprehensiveness, tone, and safety, using a 5-point Likert scale. Scores were analyzed using descriptive statistics, analysis of variance (ANOVA)/Kruskal–Wallis for inter-platform comparison, and intraclass correlation (ICC) for inter-rater reliability. Results: Across five assessment parameters, mean scores for the three platforms ranged between 3.53 and 4.13 on a 5-point scale. No statistically significant inter-platform differences were identified by ANOVA or Kruskal–Wallis tests (all P > 0.05). Inter-rater reliability was limited, with ICC (2, k) values ranging from 0.00 (poor) to 0.60 (moderate), indicating variability in expert interpretation of AI-generated responses. Conclusions: The three AI conversational platforms produced broadly comparable outputs. While responses were generally clear, reassuring, and safe, the poor-to-moderate inter-rater agreement underscores heterogeneity in expert appraisal. These findings highlight that AI platforms should be considered adjuncts rather than substitutes for professional counseling, and their evolving nature warrants ongoing evaluation.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationFetal and Pediatric Neurological DisordersPediatric Urology and Nephrology Studies
Volltext beim Verlag öffnen