Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Artificial Intelligence Conversational Platforms for Parental Queries on Antenatal Hydronephrosis: A Comparative Blinded Assessment
0
Zitationen
7
Autoren
2026
Jahr
Abstract
A BSTRACT Background: Antenatal hydronephrosis is the most common fetal anomaly detected on routine ultrasound. Parents often seek immediate information online and turn to artificial intelligence (AI) conversational platforms, which are easily available on mobile. The accuracy and reliability of these responses in sensitive pediatric surgical contexts are unknown. Objectives: To compare the quality of responses generated by ChatGPT, Gemini, and Claude to standardized parent-relevant questions on antenatal hydronephrosis. Materials and Methods: Five key questions were developed and vetted by three independent pediatric surgeons. To minimize bias, an independent person with a nonmedical background posed these questions to ChatGPT, Gemini, and Claude using a standardized background scenario. Responses were documented verbatim. Three pediatric surgeons then independently assessed each response for veracity, clarity, comprehensiveness, tone, and safety, using a 5-point Likert scale. Scores were analyzed using descriptive statistics, analysis of variance (ANOVA)/Kruskal–Wallis for inter-platform comparison, and intraclass correlation (ICC) for inter-rater reliability. Results: Across five assessment parameters, mean scores for the three platforms ranged between 3.53 and 4.13 on a 5-point scale. No statistically significant inter-platform differences were identified by ANOVA or Kruskal–Wallis tests (all P > 0.05). Inter-rater reliability was limited, with ICC (2, k) values ranging from 0.00 (poor) to 0.60 (moderate), indicating variability in expert interpretation of AI-generated responses. Conclusions: The three AI conversational platforms produced broadly comparable outputs. While responses were generally clear, reassuring, and safe, the poor-to-moderate inter-rater agreement underscores heterogeneity in expert appraisal. These findings highlight that AI platforms should be considered adjuncts rather than substitutes for professional counseling, and their evolving nature warrants ongoing evaluation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.