Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Grounded large language models for diagnostic prediction in real-world emergency department settings
2
Zitationen
10
Autoren
2025
Jahr
Abstract
Objective: To evaluate predictive diagnostic performance of open- and closed-source large language models (LLMs) in emergency medicine, addressing the urgent need for innovative clinical decision support tools amid rising patient volumes and staffing shortages. Materials and Methods: We generated 2370 AI-driven diagnostic predictions (Top-5 diagnoses from each of 6 model pipelines per patient), using data from 79 real-world emergency department cases collected consecutively during a 24-hour peak influx period at a tertiary care center. Pipelines combined open- and closed-source embedding models (text-embedding-ada-002, MXBAI) with foundational models (GPT-4, Llama3, and Qwen2) grounded via retrieval-augmented generation using emergency medicine textbooks. Models' predictions were assessed against reference diagnoses established by expert consensus. Results: < 1.4e-12), with MBXAI/Qwen2 pipeline achieving perfect citation verification. Discussion: Diagnostic accuracy primarily depended on case characteristics rather than the choice of model pipeline, highlighting fundamental AI alignment challenges in clinical reasoning. Low performance in unspecific diagnoses underscores inherent complexities in clinical definitions rather than technological shortcomings alone. Conclusion: Open-source LLM pipelines provide enhanced sourcing capabilities, crucial for transparent clinical decision-making and interpretability. Further research should expand knowledge bases to include hospital guidelines and regional epidemiology, while exploring on-premises solutions to better align with privacy regulations and clinical integration.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.560 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.451 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.948 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.797 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Autoren
Institutionen
- Cliniques Universitaires Saint-Luc(BE)
- New York State Economic Development Council(US)
- PricewaterhouseCoopers (United States)(US)
- Hôpital de Jolimont(BE)
- CHU Dinant Godinne UCL Namur(BE)
- Institute of Information and Communication Technologies(BG)
- UCLouvain(BE)
- University of Cambridge(GB)
- Cambridge School(PT)
- BioElectronics (United States)(US)
- Centre hospitalier régional de la Citadelle(BE)
- Neurological Surgery(US)