Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Large Language Models in German Continuing Medical Education Assessment: Fully Crossed Experimental Study Protocol (Preprint)

2026·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

<sec> <title>BACKGROUND</title> Continuing Medical Education (CME) is a legal and ethical obligation for physicians in Germany. The rapid rise of large language models (LLMs) such as ChatGPT, Gemini, Claude, and Grok raises concerns about the integrity of CME assessments, as LLMs can already pass German CME tests. </sec> <sec> <title>OBJECTIVE</title> To determine whether the choice of document format (searchable PDF, raster PDF, vector PDF) and LLM can influence the solvability of CME test questions by LLMs above the passing threshold specified for each CME module (typically 70%). </sec> <sec> <title>METHODS</title> In a fully crossed within-subjects repeated-measures structure, 18 expired CME articles from three major German publishers across six specialties will be converted into three PDF formats and processed by four current LLMs (ChatGPT-5, Mistral 3.1 small, Claude Sonnet 4, Grok-4) and two predecessor versions (ChatGPT-4o and Grok-3). Each model will answer every article once per file-format condition. This results in 18 experimental conditions. The primary outcome is the proportion of correctly answered questions; secondary outcomes are pass/fail rate and efficiency. The study has been approved by the University of Witten/Herdecke Ethics Committee (reference number S-260/2025, dated 08.10.2025) and is preregistered at the Open Science Framework (DOI: 10.17605/OSF.IO/V96R5). </sec> <sec> <title>RESULTS</title> Data collection will start in January 2026 and will last approximately 4 weeks. As of December 2025, the study has been preregistered, and no results are available yet. The analyses will quantify performance differences across document formats and model generations; these findings may inform the feasibility of non-searchable document formats as a temporary measure to reduce AI-enabled cheating risks in CME contexts. </sec> <sec> <title>CONCLUSIONS</title> By quantifying how document format constrains LLM performance, this study aims to evaluate simple technical safeguards that may reduce AI-assisted manipulation of CME tests and inform regulators and CME providers on balancing assessment validity, accessibility, and responsible LLM integration into postgraduate medical education. </sec> <sec> <title>CLINICALTRIAL</title> Open Science Framework DOI: 10.17605/OSF.IO/V96R5. </sec>

Autoren

Themen

Innovations in Medical EducationArtificial Intelligence in Healthcare and EducationDiversity and Career in Medicine

Volltext beim Verlag öffnen

Large Language Models in German Continuing Medical Education Assessment: Fully Crossed Experimental Study Protocol (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen