Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Enhancing large language model clinical support information with machine learning risk and explainability: a feasibility study
0
Zitationen
6
Autoren
2026
Jahr
Abstract
BACKGROUND: Current machine learning (ML) prediction models offer limited guidance for individualized actionable management. Large language models (LLMs) can transform ML model-predicted risk estimates with Shapley Additive Explanations (SHAP) into clinically meaningful support information, yet the added value of incorporating ML-derived data and the relative performance of different LLMs remain uncertain. To address these gaps, we used our previously developed IMPACT framework to evaluate the quality of LLM-generated outputs. METHODS: In this retrospective analysis of MIMIC-IV v3.1 intensive care unit (ICU) admissions, we applied a previously developed XGBoost model to estimate ICU mortality risk and derive corresponding SHAP values. GPT-4o transformed the predicted mortality risk, clinical predictors, and their SHAP values into risk interpretation, recommended examinations and management. The primary analysis examined whether augmenting LLM inputs with predicted mortality risk and SHAP values improved clinical response quality, as assessed by the IMPACT framework. We further compared GPT-4o with seven contemporary LLMs; all eight models generated clinical support responses that were scored by Claude 3.7 Sonnet to assess performance differences. RESULTS: Claude 3.7 Sonnet showed excellent agreement with human IMPACT ratings (intraclass correlation coefficient [ICC] 0.979, 95% CI 0.973-0.984) and o3-mini (ICC 0.971, 95% CI 0.964-0.980). In the primary analysis, adding predicted ICU mortality risk and SHAP values significantly increased GPT-4o IMPACT scores across prompting strategies. GPT-5 mini (96.0) and gpt-oss-120B (93.4) outperformed GPT-4o (90.4; both p < 0.001) for interpretability and quality. CONCLUSIONS: Combining ML-derived risk, SHAP explanations and LLMs may modestly improve ICU clinical support information, while LLM-based evaluators demonstrated feasibility for scalable evaluation of generated clinical content.
Ähnliche Arbeiten
The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3)
2016 · 27.307 Zit.
pROC: an open-source package for R and S+ to analyze and compare ROC curves
2011 · 13.746 Zit.
APACHE II
1985 · 13.596 Zit.
Definitions for Sepsis and Organ Failure and Guidelines for the Use of Innovative Therapies in Sepsis
1992 · 13.181 Zit.
The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure
1996 · 11.504 Zit.