Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large Language Models in Undergraduate Medical Education: A Scoping Review of Use Cases, Effectiveness, and Limitations
0
Zitationen
2
Autoren
2025
Jahr
Abstract
<title>Abstract</title> Background Large Language Models (LLMs) like ChatGPT have been identified as potential additions to undergraduate medical education (UGME). Their applications include assessment, simulation, and personalized learning, although their efficacy and risks are still not well understood. However, the breadth, nature, and range of this evidence remain unclear, a mapping of evidence of their utility, accuracy, and limitations is necessary to inform evidence-based integration. Methods A scoping review was performed to identify the empirical literature regarding the use of LLMs in UGME. Eligible studies were experimental, cross-sectional and qualitative designs that evaluated LLM use in formative and or summative activities. Studies were included if they applied LLMs for assessment, simulation, or educational support among pre-clinical or clinical students. Study design, country/income level, LLM model, purpose, mode of use, student level, prompting mode, outcomes, and limitations were extracted. Results A total of nine studies were included from seven countries (high- and upper-middle-income countries). Studies ranged from cross-sectional trials and feasibility studies to qualitative focus groups and mixed-methods scoring analysis. The most popular LLM was ChatGPT (versions 3.5, 4, and 4o). Applications ranged from MCQ generation, automated scoring (OSCEs and short answers), clinical simulation, revision support, and documentation feedback. Performance varied: MCQ usability (91% templates were usable), high correlation with humans’ scores (r = 0.599–0.732), and GPT-4 items were judged as almost equivalent to expert-written questions. The risks of the intervention included hallucinations (38% success rate), content mistakes, lack of empathy and biased answer generation. Prompt engineering and human tending were necessary for output quality. Conclusion LLMs appear to be of moderate to high feasibility in UGME settings, particularly when combined with structured prompts and expert review. Despite its potential for formative use and scalability, rigorous psychometric validation and learner-centered research are needed.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.