Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Dataset for Evaluating Large Language Models on Chinese National Medical Licensing Examinations

2026·0 Zitationen·Scientific DataOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large language models (LLMs) are increasingly applied in medical education, question answering, and clinical reasoning, yet standardized datasets in non-English contexts remain limited. To address this gap, we present CNMLEQA, a benchmark dataset for evaluating LLMs on the Chinese National Medical Licensing Examination. The dataset integrates question-answer pairs from three sources, including PubMed, GitHub, and MedExamLLM. CNMLEQA comprises two subsets: CNMLEQA-10k (9,890 questions) and CNMLEQA-3k (2,949 questions), each consisting of multiple-choice questions with five options and one correct answer. Questions are annotated with key dimensions including: (1) question type (knowledge-based or case-based), (2) auxiliary metadata such as examination year, 3) clinical scenario information across five dimensions: disease or diagnosis, surgery, medication, laboratory examination, and symptom or sign. Annotation was conducted by clinical experts. To validate the dataset, we evaluated state-of-the-art LLMs including Gemini, DeepSeek, GPT, Qwen, and LLaMA, and conducted fine-tuning experiments specifically on Qwen models. Results show that Qwen2.5-32B achieved the accuracy of 90.88% on CNMLEQA-10k, while DeepSeek-R1 achieved the accuracy of 91.59% on CNMLEQA-3k. The fine-tuning experiments further demonstrated significant performance improvements. CNMLEQA provides a multidimensional, clinically grounded benchmark for advancing LLM evaluation in Chinese medical applications.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareTopic Modeling

Volltext beim Verlag öffnen

A Dataset for Evaluating Large Language Models on Chinese National Medical Licensing Examinations

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen