OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 05:13

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Real-world evaluation of large language models in detecting drug-related problems: A clinical pharmacist–AI concordance study in hematology care

2026·0 Zitationen·Journal of Oncology Pharmacy Practice
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2026

Jahr

Abstract

IntroductionLarge language models (LLMs) offer potential as clinical decision support systems (CDSS) for detecting drug-related problems (DRPs), yet their real-world performance compared to clinical pharmacists (CPs) remains unclear, especially in complex hematology care. We aimed to evaluate the concordance between a clinical pharmacist and three LLMs in identifying DRPs within a Bone Marrow Transplantation unit.MethodsThis prospective observational study evaluated the concordance between a CP and three LLMs (ChatGPT-4o, Grok-3, DeepSeek-v3) in a Bone Marrow Transplantation unit. Eighty-three anonymized patient cases encompassing 210 CP-identified DRPs, classified via the PCNE v9.1 system, were presented using a standardized CDSS-simulating prompt. Performance was assessed based on direct detection, prompted detection after structured follow-up, and the clinical relevance of AI-generated therapeutic recommendations against the CP's gold-standard assessments.ResultsDirect detection of intervention-requiring DRPs was limited (51.4%-60.5% across models), with nearly half missed initially. Guided prompting significantly improved overall detection rates to 93.8%-98.1%, with ChatGPT achieving the highest accuracy. All models produced hallucinations. Recommendation concordance with the CP exceeded 70% in most DRP categories. DeepSeek and ChatGPT showed more consistent performance in context-dependent evaluations, whereas Grok demonstrated higher direct detection but lower recommendation alignment. LLMs demonstrate meaningful potential to assist in DRP detection but are not sufficiently reliable as standalone tools. Expert-guided interaction substantially enhanced their performance, underscoring the critical value of hybrid pharmacist-AI workflows.ConclusionFuture research should validate these findings across broader populations with multiple expert evaluators and integrate next-generation AI architectures for safer CDSS implementation.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareTopic Modeling
Volltext beim Verlag öffnen