Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

EMPEC: A Comprehensive Benchmark for Evaluating Large Language Models Across Diverse Healthcare Professions

2025·1 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Recent advancements in Large Language Models (LLMs) show their potential in accurately answering biomedical questions, yet current healthcare benchmarks primarily assess knowledge mastered by medical doctors, neglecting other essential professions.To address this gap, we introduce the Examinations for Medical PErsonnel in Chinese (EMPEC), a comprehensive healthcare knowledge benchmark featuring 157,803 exam questions across 124 subjects and 20 healthcare professions, including underrepresented roles like Optometrists and Audiologists.Each question is tagged for release time and source authenticity.We evaluated 17 LLMs, including proprietary and open-source models, finding that while models like GPT-4 achieved over 75% accuracy, they struggled with specialised fields and alternative medicine.Notably, we find that most medicalspecific LLMs underperform their generalpurpose counterparts in EMPEC, and incorporating EMPEC's data in fine-tuning improves performance.In addition, we tested LLMs on questions released after the completion of their training to examine their ability in unseen queries.We also translated the test set into English and simplified Chinese and analyse the impact on different models.Our findings emphasise the need for broader benchmarks to assess LLM applicability in real-world healthcare, and we will provide the dataset and evaluation toolkit for future research.Our data and code are in https://github.com/ zhehengluoK/eval_empec.Recent advancements in Large Language Models (LLMs) have demonstrated the potential of LLMbased Artificial Intelligence (AI) in providing accurate answers to questions about world knowledge.

Autoren

Themen

Artificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

EMPEC: A Comprehensive Benchmark for Evaluating Large Language Models Across Diverse Healthcare Professions

Abstract

Ähnliche Arbeiten

Autoren

Themen