Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Real-world evaluation of large language models in detecting drug-related problems: A clinical pharmacist–AI concordance study in hematology care
0
Zitationen
4
Autoren
2026
Jahr
Abstract
IntroductionLarge language models (LLMs) offer potential as clinical decision support systems (CDSS) for detecting drug-related problems (DRPs), yet their real-world performance compared to clinical pharmacists (CPs) remains unclear, especially in complex hematology care. We aimed to evaluate the concordance between a clinical pharmacist and three LLMs in identifying DRPs within a Bone Marrow Transplantation unit.MethodsThis prospective observational study evaluated the concordance between a CP and three LLMs (ChatGPT-4o, Grok-3, DeepSeek-v3) in a Bone Marrow Transplantation unit. Eighty-three anonymized patient cases encompassing 210 CP-identified DRPs, classified via the PCNE v9.1 system, were presented using a standardized CDSS-simulating prompt. Performance was assessed based on direct detection, prompted detection after structured follow-up, and the clinical relevance of AI-generated therapeutic recommendations against the CP's gold-standard assessments.ResultsDirect detection of intervention-requiring DRPs was limited (51.4%-60.5% across models), with nearly half missed initially. Guided prompting significantly improved overall detection rates to 93.8%-98.1%, with ChatGPT achieving the highest accuracy. All models produced hallucinations. Recommendation concordance with the CP exceeded 70% in most DRP categories. DeepSeek and ChatGPT showed more consistent performance in context-dependent evaluations, whereas Grok demonstrated higher direct detection but lower recommendation alignment. LLMs demonstrate meaningful potential to assist in DRP detection but are not sufficiently reliable as standalone tools. Expert-guided interaction substantially enhanced their performance, underscoring the critical value of hybrid pharmacist-AI workflows.ConclusionFuture research should validate these findings across broader populations with multiple expert evaluators and integrate next-generation AI architectures for safer CDSS implementation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.239 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.095 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.463 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.428 Zit.