Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance benchmarking of LLMs on Chinese national medical licensing education: Cross-lingual and question-type effects

2026·0 Zitationen·PLoS ONEOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND: The cross-lingual and question-type variations affecting large language models (LLMs) accuracy on the Chinese national medical licensing educations remain insufficiently explored. METHODS: In this cross-sectional study (May 13-20, 2025), 396 educational questions (198 English-Chinese pairs) were extracted from the Chinese national medical licensing examination. ChatGPT-4o, ChatGPT-o3, Gemini-2.5-pro, Deepseek-V3, Deepseek-R1, and Doubao-1.5-pro were prompted to provide answers. Responses were compared against reference answers, and accuracy was computed for three question types: basic knowledge (Type A), case analysis (Type B), and integrative judgment (Type C). RESULTS: Across all question types and languages, Doubao-1.5-pro achieved the highest accuracy at 92.0% ± 1.3%, whereas ChatGPT-4o had the lowest accuracy at 82.8% ± 3.7%. There was a significant main effect of question type (P = 0.0038) but no main effect of language (P = 0.56). Post hoc tests confirmed that Type A performance exceeded Types B and C (P < 0.01), while B vs. C did not differ. Among the models, Doubao-1.5-pro, Deepseek-R1, and Deepseek-V3 demonstrated notable cross-lingual stability, with accuracy differences between Chinese and English versions remaining below 5%. CONCLUSION: The question type was a key factor affecting LLMs performance on Chinese medical licensing exam questions, whereas language had no significant impact. Doubao-1.5-pro, Deepseek-R1, and Deepseek-V3 demonstrated particularly strong cross-lingual consistency. These findings point to the potential value of specialized LLMs for enhancing medical education in China.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationInnovations in Medical EducationAcademic integrity and plagiarism

Volltext beim Verlag öffnen

Performance benchmarking of LLMs on Chinese national medical licensing education: Cross-lingual and question-type effects

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen