Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Can large language models replace standardised patients?
1
Zitationen
10
Autoren
2025
Jahr
Abstract
Standardised patients (SPs) play a crucial role in medical education by allowing students to practice diagnostic skills in a risk-free environment. This not only boosts their confidence but also provides them with immediate feedback. However, despite their importance in training medical professionals, the deployment and integration of SPs into educational systems in developing regions face significant obstacles.1 These include the high cost of training, varying levels of medical education and socio-cultural differences. The emergence of large language models (LLMs) has further catalysed transformations in medical education. Evaluating the effectiveness and reliability of LLMs as substitutes for SPs is especially important in regions with limited medical resources. To evaluate the viability of LLMs as SPs, we designed a study where LLMs were prompted to simulate SPs. The process involved transcribing video recordings of clinical student encounters with a human SP into text, which resulted in a dataset of 6600 questions and answers. Open-source and closed-source LLMs were tested, and their performance was evaluated by independent expert clinical physicians in a blinded manner. Based on the evaluations, we developed a teaching system powered by the most capable LLM. All participants completed two sequentially administered standardised clinical examinations using a repeated-measures design, first with human SPs and then with LLM-simulated SPs. A questionnaire survey, developed through expert consultation and group discussions, was used to assess students' experiences, focusing on exam difficulty, psychological feelings and the effectiveness of role-play. Utilising LLMs in the role of SPs has generated significant interest and enthusiasm among educators and learners. Additionally, this approach has underscored the vast potential of artificial intelligence in reshaping the landscape of medical education. The expertise and availability of SPs represent a precious resource, and the integration of LLMs can enhance the scope of SP-based instructional resources. Currently, LLMs are effectively utilised to augment the instructional approach of SPs. They facilitate both pre- and post-practice review sessions with SPs, thereby enhancing the number of training instances available to students. The blind test indicated that two LLMs scored higher than SPs. However, survey results revealed that students' ratings of SPs exceeded those of LLMs in terms of examination difficulty and role-play assessment. SPs were found to be less effective than LLMs in students' psychological experiences and no significant differences in process experiences. LLMs and SPs each have unique strengths, making LLMs a valuable supplement to, rather than a replacement for, SPs. The advantage of LLMs lies in their ability to conduct simulated consultations anytime and anywhere, helping students feel more relaxed and confident. In contrast, students rated SPs higher in terms of examination difficulty and role-play assessment. This feedback underscores the nuanced and complex role that SPs play in medical education, aspects that current LLMs may not fully replicate. Future research should investigate the role-playing capabilities of LLMs as SPs across multiple languages and explore methods to enhance their performance, such as supervised fine-tuning and continued pre-training. Additionally, developing embodied agents that leverage LLMs represents a significant advancement in medical education methodologies. Weipeng Han, Xiaohong Lyu, and Ji-Jiang Yang contributed equally to this work, including results interpretation and manuscript preparation, and share co-first authorship. Weipeng Han contributed significantly to the conceptualisation and design of the experiments and methodology. Xiaohong Lyu was instrumental in data collection and analysis. Ji-Jiang Yang was pivotal in devising the software design and bringing it to fruition. Shi Chen, Jiming Zhu, and Xiaoming Huang are credited as co-corresponding authors for their joint supervision of the entire research process, and collaborative leadership. Mengsha Yan performed critical review and editing of the manuscript while providing essential research resources. Yuelun Zhang conducted formal data analysis and contributed to manuscript refinement through editorial review. Tingyan Wang managed data curation, developed software tools, and participated in editorial revisions of the paper. Hui Pan oversaw research supervision and provided expert guidance during the manuscript's review and editing phases. This study was funded by the Peking Union Medical College Hospital Teaching Programme (X361011). Thanks to Yutong Wang, Qingyi Zuo, Xiaoyuan Guo, Nan Jiang, Bochuan Huang, Daiyu Yang, Sijing Yan, Yuxin Sun, Guanlin Lu, Jiaqi Qiang, and Xinke Zhou for their assistance in the task of proofreading the voice-to-text content. During the preparation of this work the authors used ChatGLM in order to refine the phrasing. After using this service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication. Ethics approval for the study was obtained through the Peking Union Medical College Hospital Ethics Committee (K4899). The data that support the findings of this study are available from the corresponding author upon reasonable request.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.227 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.601 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.387 Zit.