Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluation of large language models’ ability to identify clinically relevant drug-drug interactions and generate high-quality clinical pharmacotherapy recommendations
5
Zitationen
8
Autoren
2025
Jahr
Abstract
PURPOSE: Large language models (LLMs) are promising artificial intelligence (AI) tools to support clinical decision-making. The ability of LLMs to evaluate medication regimens, identify drug-drug interactions (DDIs), and provide clinical recommendations has undergone limited evaluation. The purpose of this study was to compare the performance of 3 LLMs in recognizing DDIs, determining clinical relevance, and generating management recommendations. METHODS: A total of 15 patient cases with medication regimens were created; each contained a commonly encountered DDI. Two separate study phases were developed: (1) DDI identification and determination of clinical relevance; and (2) DDI identification and generation of a clinical recommendation. The primary outcome was the ability of the LLMs (GPT-4, Gemini 1.5, and Claude-3) to identify the DDI within each medication regimen. Secondary outcomes included the ability of the LLMs to identify the clinical relevance of each DDI and generate a recommendation of high quality relative to ground truth. RESULTS: Claude 3 identified all DDIs, followed by GPT-4 (14/15, 93.3%) and Gemini 1.5 (12/15, 80.0%). All LLMs were significantly more likely than clinical experts to categorize the DDI as clinically relevant (P < 0.01). DDI management recommendations provided by GPT-4 were rated as optimal in 8 of 13 (61.5%) of the cases (P = 0.05 for comparison to ground truth). Two recommendations from GPT-4 and one recommendation from Gemini 1.5 were deemed to result in potential patient harm. CONCLUSION: While LLMs demonstrate promising potential to identify DDIs, application to clinical cases requires ongoing development. Findings from this study may assist in future development and refinement of LLMs for clinical decision-making related to DDIs.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.615 Zit.
Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data
2005 · 10.529 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.883 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.451 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.948 Zit.