OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 26.03.2026, 05:20

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of ChatGPT-o1 and DeepSeek-R1 on the health law related questions in the Chinese national licensing examination: a comparative study

2026·0 Zitationen·Frontiers in EducationOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2026

Jahr

Abstract

Background This study aimed to compare the performance of two advanced large language models (LLMs), DeepSeek-R1 and ChatGPT-o1, in addressing health law–related questions from the Chinese National Medical Licensing Examination (CNMLE), thereby evaluating their applicability in medical education in non-English contexts. Methods A total of 400 health law questions were randomly selected from the official CNMLE guidebook. Each question was independently administered to DeepSeek-R1 and ChatGPT-o1 via standardized Application Programming Interfaces (API) prompts to minimize hallucination and memory effects. Model responses were compared against the official answers, and statistical analyses were conducted using McNemar’s test, with p < 0.05 indicating significance. Results DeepSeek-R1 achieved an overall accuracy of 93.5% (374/400), significantly higher than ChatGPT-o1’s 79.5% (318/400, p < 0.001). Subgroup analysis revealed that DeepSeek-R1 consistently outperformed ChatGPT-o1 across most legal categories, including medical institutions, infectious disease prevention, malpractice liability, and pharmaceutical regulation. Both models performed comparably in categories such as blood donation and maternal–child health law. DeepSeek-R1 also achieved perfect accuracy in smaller domains such as public health emergencies and occupational disease control. Conclusion DeepSeek-R1 demonstrated superior performance compared with ChatGPT-o1 in answering health law questions within the CNMLE, highlighting its potential as a reliable tool for medical education in China. The findings underscore the influence of linguistic and cultural context on LLM performance. Future work should expand evaluation to open-ended and case-based questions and explore fine-tuning strategies to enhance the accuracy in healthcare settings.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)Social Media in Health Education
Volltext beim Verlag öffnen