OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 06:16

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of ChatGPT-4 in Answering German-Language CME Questions by Medical Laypersons: A Randomized Controlled Trial (Preprint)

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2025

Jahr

Abstract

<sec> <title>BACKGROUND</title> Points for Continuing Medical Education (CME) are mandatory for medical specialists in Germany and are associated with minimum time requirements. These CME points are awarded upon successful completion of standardized knowledge assessments, typically based on educational articles provided by publishers. Recently, interest has grown in exploring how artificial intelligence might influence or even automate this learning process. </sec> <sec> <title>OBJECTIVE</title> We investigate for the first time whether a Large Language Model (ChatGPT-4) can answer German-language CME tests by medical laypersons in such a way that official points can be awarded, and how much time this requires. In doing so, we aim to assess the performance, efficiency, and implications of AI-assisted self-learning in regulated medical education settings. </sec> <sec> <title>METHODS</title> The study was registered in international study registries (Open Scientific Framework, OSF: doi:10.17605/OSF.IO/MZNUF; International Registered Report Identifier, IRRID: PRR1-10.2196/63887, doi:10.2196/63887) and received a positive vote from the ethics committee of Witten/Herdecke University (Ref. No. S-108/2024). In a randomized, single-blinded study, 18 CME tests were conducted across three study arms (n = 3 ∙ 18 = 54 datasets in total): "Search and Find" (CME text and full-text search only), "Answers Only" (ChatGPT-4 without CME material), "All-in" (ChatGPT-4 with complete CME material including questions in an uploaded PDF file). We compared results using Mann-Whitney U and Fisher's tests; we defined fast test runs as completion in less than 30 minutes. </sec> <sec> <title>RESULTS</title> Arm 1 achieved a median of 50% correct answers (IQR 40%) and a processing time of 38.5 min. Arm 2 passed with a median of 80% (IQR 20%; p &lt; 0.0001) in 7.5 min (IQR 3 min; p &lt; 0.001). Arm 3 achieved 95% (IQR 10%; p &lt; 0.0001) in 3 min (IQR 2 min; p &lt; 0.0001). Compared to publisher data, the AI performance in Arm 3 was not significantly different from human results. </sec> <sec> <title>CONCLUSIONS</title> ChatGPT-4 enables medical laypersons to quickly and completely answer text-based CME tests, allowing official points to be earned. This raises questions about the integrity of self-study and suggests that CME systems need to be adapted both technically and regulatorily. </sec> <sec> <title>CLINICALTRIAL</title> Open Scientific Framework, OSF: doi:10.17605/OSF.IO/MZNUF; International Registered Report Identifier, IRRID: PRR1-10.2196/63887, doi:10.2196/63887 </sec> <sec> <title>INTERNATIONAL REGISTERED REPORT</title> RR2-10.2196/63887 </sec>

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsInnovations in Medical Education
Volltext beim Verlag öffnen