Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Generative Artificial Intelligence Models in Clinical Infectious Disease Consultations: A Cross-Sectional Analysis Among Specialists and Resident Trainees
1
Zitationen
11
Autoren
2025
Jahr
Abstract
<b>Background/Objectives:</b> The potential of generative artificial intelligence (GenAI) to augment clinical consultation services in clinical microbiology and infectious diseases (ID) is being evaluated. <b>Methods:</b> This cross-sectional study evaluated the performance of four GenAI chatbots (GPT-4.0, a Custom Chatbot based on GPT-4.0, Gemini Pro, and Claude 2) by analysing 40 unique clinical scenarios. Six specialists and resident trainees from clinical microbiology or ID units conducted randomised, blinded evaluations across factual consistency, comprehensiveness, coherence, and medical harmfulness. <b>Results:</b> Analysis showed that GPT-4.0 achieved significantly higher composite scores compared to Gemini Pro (<i>p</i> = 0.001) and Claude 2 (<i>p</i> = 0.006). GPT-4.0 outperformed Gemini Pro and Claude 2 in factual consistency (Gemini Pro, <i>p</i> = 0.02; Claude 2, <i>p</i> = 0.02), comprehensiveness (Gemini Pro, <i>p</i> = 0.04; Claude 2, <i>p</i> = 0.03), and the absence of medical harm (Gemini Pro, <i>p</i> = 0.02; Claude 2, <i>p</i> = 0.04). Within-group comparisons showed that specialists consistently awarded higher ratings than resident trainees across all assessed domains (<i>p</i> < 0.001) and overall composite scores (<i>p</i> < 0.001). Specialists were five times more likely to consider responses as "harmless". Overall, fewer than two-fifths of AI-generated responses were deemed "harmless". Post hoc analysis revealed that specialists may inadvertently disregard conflicting or inaccurate information in their assessments. <b>Conclusions:</b> Clinical experience and domain expertise of individual clinicians significantly shaped the interpretation of AI-generated responses. In our analysis, we have demonstrated disconcerting human vulnerabilities in safeguarding against potentially harmful outputs, which seemed to be most apparent among experienced specialists. At the current stage, none of the tested AI models should be considered safe for direct clinical deployment in the absence of human supervision.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.