Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
¿Es bueno confiar en recomendaciones de la inteligencia artificial basadas en guías clínicas?
0
Zitationen
10
Autoren
2025
Jahr
Abstract
Introduction. Medical staff often face difficulties in consulting and applying clinical guidelines in practice. Large language models, especially when combined with retrieval-augmented generation, may help overcome these challenges by producing context-specific outputs with improved adherence to medical guidelines.Objectives. To assess the performance of commercial large language models in answering maternal health questions within retrieval-augmented generation systems, using both human and automated evaluation metrics.Material and methods. A controlled experiment was designed to obtain accurate, consistent answers from a retrieval-augmented generation system based on Colombian maternal care guidelines. A physician formulated ten questions and defined the groundtruth answers. Various large language models were tested with a standardized prompt and evaluated through binary answer–concept ranking and retrieval-augmented generation assessment, metrics, judged by two independent large language models.Results. Generative pre-trained transformer 3.5 (GPT-3.5) achieved the highest physicianassessed accuracy (0.90). Claude 3.5 obtained the top faithfulness score (0.78) under GPT-4.o evaluation, while Mistral ranked highest (0.84) under Claude 3.5 evaluation. Regarding answer relevance, GPT-3.5 scored highest across both judges (0.94 and 0.86). Conclusions. Integrating retrieval-augmented generation into obstetric care has the potential to enhance evidence-based practices and improve patient outcomes. However, rigorous validation of accuracy and context-specific reliability is essential before clinical deployment. The findings of this study indicate that large-scale models (e.g., GPT-3.5, Claude, Llama 70B) consistently outperform lighter models such as Llama 8B.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.102 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.468 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.