OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 15:17

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of AI models for radiology exam preparation: DeepSeek vs. ChatGPT−3.5

2025·0 Zitationen·Medical Education OnlineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2025

Jahr

Abstract

The rapid advancement of artificial intelligence (AI) chatbots has generated significant interest regarding their potential applications within medical education. This study sought to assess the performance of the open-source large language model DeepSeek-V3 in answering radiology board-style questions and to compare its accuracy with that of ChatGPT-3.5.A total of 161 questions (comprising 207 items) were randomly selected from the <i>Exercise Book for the National Senior Health Professional Qualification Examination: Radiology</i>. The question set included single-choice, multiple-choice, shared-stem, and case analysis questions. Both DeepSeek-V3 and ChatGPT-3.5 were evaluated using the same question set over a seven-day testing period. Response accuracy was systematically assessed, and statistical analyses were performed using Pearson's chi-square test and Fisher's exact test.DeepSeek-V3 achieved an overall accuracy of 72%, which was significantly higher than the 55.6% accuracy achieved by ChatGPT-3.5 (<i>P</i> < 0.001). Performance analysis by question type revealed DeepSeek's superior accuracy in single-choice questions (87.1%), though with comparatively lower performance in multiple-choice (55.7%) and case analysis questions (68.0%). Across clinical subspecialties, DeepSeek consistently outperformed ChatGPT, particularly in peripheral nervous system (<i>P</i> = 0.003), respiratory system (<i>P</i> = 0.008), circulatory system (<i>P</i> = 0.012), and musculoskeletal system (<i>P</i> = 0.021) domains.In conclusion, DeepSeek demonstrates considerable potential as an educational tool in radiology, particularly for knowledge recall and foundational learning applications. However, its relatively weaker performance on higher-order cognitive tasks and complex question formats suggests the need for further model refinement. Future research should investigate DeepSeek's capability in processing image-based questions and perform comparative analyses with more advanced models (e.g., GPT-5) to better evaluate its potential for medical education.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsRadiology practices and education
Volltext beim Verlag öffnen