Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Use of large language models in undergraduate medical education a scoping review

2026·0 Zitationen·Discover EducationOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Background Large Language Models (LLMs), such as ChatGPT, are increasingly utilized in undergraduate medical education (UGME) for assessment, simulation, and personalized learning. However, the breadth, nature, feasibility, and range of their application remain unclear; a comprehensive mapping of evidence of their utility, accuracy, feasibility, and reported performance outcomes is necessary to inform evidence-based integration. Methods A scoping review was performed following PRISMA-ScR and JBI guidelines to identify the empirical literature regarding the use of LLMs in UGME. PubMed and Google Scholar were systematically searched for studies published from January 2021 to July 2025. Eligible studies were experimental, cross-sectional, and qualitative designs that evaluated LLM use in formative and or summative activities. Studies were included if they applied LLMs for assessment, simulation, or educational support among pre-clinical or clinical students. Study design, country/income level, LLM model, purpose, mode of use, student level, prompting mode, outcomes, and limitations were extracted. Results A total of eight studies were included from seven countries (high- and upper-middle-income countries); these countries indicate where these studies were conducted, rather than the countries of origin of the LLMs or their educational contexts. Studies ranged from cross-sectional trials and feasibility studies to qualitative focus groups and mixed-methods scoring analysis. The most popular LLM was ChatGPT (versions 3.5, 4, and 4o). Applications ranged from MCQ generation, automated scoring (OSCEs and short answers), clinical simulation, revision support, and documentation feedback. Performance varied: MCQ usability (91% of templates were usable), high correlation with human scores (r = 0.599–0.732), and GPT-4 items were judged as nearly equivalent to expert-written questions. The risks of the intervention included hallucinations (38% success rate), content mistakes, lack of empathy, and biased answer generation. Prompt engineering and human tending were necessary for output quality. Conclusion LLMs demonstrate moderate to high feasibility in UGME settings, particularly when combined with structured prompts, expert review, and an awareness of AI limitations. Despite its potential for formative use and scalability, rigorous psychometric validation, mitigation of hallucinations and bias, multi-institutional validation, and learner-centred research are needed.

Autoren

Institutionen

University of Nigeria Teaching Hospital(NG)

Themen

Artificial Intelligence in Healthcare and EducationSimulation-Based Education in HealthcareDiversity and Career in Medicine

Volltext beim Verlag öffnen

Use of large language models in undergraduate medical education a scoping review

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen