Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of ChatGPT-o1 and DeepSeek-R1 on the health law related questions in the Chinese national licensing examination: a comparative study
0
Zitationen
8
Autoren
2026
Jahr
Abstract
Background This study aimed to compare the performance of two advanced large language models (LLMs), DeepSeek-R1 and ChatGPT-o1, in addressing health law–related questions from the Chinese National Medical Licensing Examination (CNMLE), thereby evaluating their applicability in medical education in non-English contexts. Methods A total of 400 health law questions were randomly selected from the official CNMLE guidebook. Each question was independently administered to DeepSeek-R1 and ChatGPT-o1 via standardized Application Programming Interfaces (API) prompts to minimize hallucination and memory effects. Model responses were compared against the official answers, and statistical analyses were conducted using McNemar’s test, with p < 0.05 indicating significance. Results DeepSeek-R1 achieved an overall accuracy of 93.5% (374/400), significantly higher than ChatGPT-o1’s 79.5% (318/400, p < 0.001). Subgroup analysis revealed that DeepSeek-R1 consistently outperformed ChatGPT-o1 across most legal categories, including medical institutions, infectious disease prevention, malpractice liability, and pharmaceutical regulation. Both models performed comparably in categories such as blood donation and maternal–child health law. DeepSeek-R1 also achieved perfect accuracy in smaller domains such as public health emergencies and occupational disease control. Conclusion DeepSeek-R1 demonstrated superior performance compared with ChatGPT-o1 in answering health law questions within the CNMLE, highlighting its potential as a reliable tool for medical education in China. The findings underscore the influence of linguistic and cultural context on LLM performance. Future work should expand evaluation to open-ended and case-based questions and explore fine-tuning strategies to enhance the accuracy in healthcare settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.312 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.169 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.564 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.466 Zit.