OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.03.2026, 09:21

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Brief Review on Benchmarking for Large Language Models Evaluation in Healthcare

2025·8 Zitationen·Wiley Interdisciplinary Reviews Data Mining and Knowledge DiscoveryOpen Access
Volltext beim Verlag öffnen

8

Zitationen

7

Autoren

2025

Jahr

Abstract

ABSTRACT This paper reviews benchmarking methods for evaluating large language models (LLMs) in healthcare settings. It highlights the importance of rigorous benchmarking to ensure LLMs' safety, accuracy, and effectiveness in clinical applications. The review also discusses the challenges of developing standardized benchmarks and metrics tailored to healthcare‐specific tasks such as medical text generation, disease diagnosis, and patient management. Ethical considerations, including privacy, data security, and bias, are also addressed, underscoring the need for multidisciplinary collaboration to establish robust benchmarking frameworks that facilitate LLMs' reliable and ethical use in healthcare. Evaluation of LLMs remains challenging due to the lack of standardized healthcare‐specific benchmarks and comprehensive datasets. Key concerns include patient safety, data privacy, model bias, and better explainability, all of which impact the overall trustworthiness of LLMs in clinical settings.

Ähnliche Arbeiten