OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 11.03.2026, 17:46

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Reframing Clinical AI Evaluation in the Era of Generative Models: Toward Multidimensional, Stakeholder-Informed, and Safety-Centric Frameworks for Real-World Health Care Deployment

2025·0 Zitationen·Premier journal of science.Open Access
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2025

Jahr

Abstract

The integration of artificial intelligence (AI) in the form of large language models (LLMs) and generative models into clinical practice has progressed ahead of metrics available to measure their performance in real-world settings. Traditional benchmarks such as area under the receiver operating characteristic curve or bilingual evaluation understudy (BLEU) scores are inadequate to meet clinical nuance, patient safety, explainability, and workflow integration. This scoping review maps the evolving landscape of clinical AI evaluation, combining academic and industry architectures, including clinical risk evaluation of LLMs for hallucination and omission (CREOLA), hospital deployments, and radiological tool reviews. We explore stakeholder tensions between academia, business viability, regulation, and frontline usability, and reveal how these perceptions build competing evaluation imperatives. In particular, we highlight the novel challenges created by generative models: hallucination, omission, narrative incoherence, and epistemic misalignment. The current paper elucidates that a strategy of layered, stakeholder-engaged design needs to integrate risk stratification, contextual awareness, and continuous postdeployment surveillance. Equity, interpretability, and clinician trust are not thought of as footnotes, but as central columns upon which evaluation is built. This review offers a synthesizing overview of how health systems, developers, and regulators can coconstruct adaptive and ethically grounded evaluation frameworks, ensuring that AI tools enhance, rather than erode, clinical judgment, patient safety, and health equity in real-world care.

Ähnliche Arbeiten