Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large Language Models in German Continuing Medical Education Assessment: Fully Crossed Experimental Study Protocol (Preprint)
0
Zitationen
6
Autoren
2026
Jahr
Abstract
<sec> <title>BACKGROUND</title> Continuing Medical Education (CME) is a legal and ethical obligation for physicians in Germany. The rapid rise of large language models (LLMs) such as ChatGPT, Gemini, Claude, and Grok raises concerns about the integrity of CME assessments, as LLMs can already pass German CME tests. </sec> <sec> <title>OBJECTIVE</title> To determine whether the choice of document format (searchable PDF, raster PDF, vector PDF) and LLM can influence the solvability of CME test questions by LLMs above the passing threshold specified for each CME module (typically 70%). </sec> <sec> <title>METHODS</title> In a fully crossed within-subjects repeated-measures structure, 18 expired CME articles from three major German publishers across six specialties will be converted into three PDF formats and processed by four current LLMs (ChatGPT-5, Mistral 3.1 small, Claude Sonnet 4, Grok-4) and two predecessor versions (ChatGPT-4o and Grok-3). Each model will answer every article once per file-format condition. This results in 18 experimental conditions. The primary outcome is the proportion of correctly answered questions; secondary outcomes are pass/fail rate and efficiency. The study has been approved by the University of Witten/Herdecke Ethics Committee (reference number S-260/2025, dated 08.10.2025) and is preregistered at the Open Science Framework (DOI: 10.17605/OSF.IO/V96R5). </sec> <sec> <title>RESULTS</title> Data collection will start in January 2026 and will last approximately 4 weeks. As of December 2025, the study has been preregistered, and no results are available yet. The analyses will quantify performance differences across document formats and model generations; these findings may inform the feasibility of non-searchable document formats as a temporary measure to reduce AI-enabled cheating risks in CME contexts. </sec> <sec> <title>CONCLUSIONS</title> By quantifying how document format constrains LLM performance, this study aims to evaluate simple technical safeguards that may reduce AI-assisted manipulation of CME tests and inform regulators and CME providers on balancing assessment validity, accessibility, and responsible LLM integration into postgraduate medical education. </sec> <sec> <title>CLINICALTRIAL</title> Open Science Framework DOI: 10.17605/OSF.IO/V96R5. </sec>
Ähnliche Arbeiten
The qualitative content analysis process
2008 · 21.614 Zit.
Making sense of Cronbach's alpha
2011 · 13.700 Zit.
Standards for Reporting Qualitative Research
2014 · 10.978 Zit.
Health professionals for a new century: transforming education to strengthen health systems in an interdependent world
2010 · 5.688 Zit.
Audit and feedback: effects on professional practice and healthcare outcomes
2012 · 5.491 Zit.