Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Large language models as educational collaborators: developing non-conventional teaching aids in pharmacology & therapeutics
1
Zitationen
2
Autoren
2025
Jahr
Abstract
BACKGROUND: With the growing integration of artificial intelligence in medical education, this study compares the quality and educational robustness of content generated by two large language models (LLMs), DeepSeek-V3 and ChatGPT 4.0, on the emerging, non-conventional topic (and not present in textbooks) of gender-affirming hormone therapy (GAHT) across three educational phases: preclerkship and clerkship phases in undergraduate medical curriculum, and master's level in pharmacology. METHODS: A total of 23 prompts were designed to generate Specific Learning Objectives (SLOs), reading materials, assessment items (MCQs, SAQs, and OSPEs), and case-based learning (CBL) scenarios across the three learner stages. The outputs from both LLMs were evaluated independently using rubric-based frameworks assessing content appropriateness, pedagogical structure, assessment alignment, and inclusivity. RESULTS: Both LLMs produced pedagogically sound outputs; however, DeepSeek consistently demonstrated superior adherence to rubric criteria. For SLOs, DeepSeek maintained a clear hierarchical progression across phases and showed greater precision, contextual alignment, and time-bound formulation. Its objectives were more assessable and reflective of increasing cognitive complexity. ChatGPT's SLOs were inclusive and coherent but occasionally lacked time-specificity and structural clarity. In reading materials, DeepSeek outperformed by integrating clinical relevance, scaffolded structure, and interactive learning tools across all phases. It included visual aids, case vignettes, and phase-specific assessments, while ChatGPT's content was accurate and readable but leaned toward text-heavy exposition with fewer embedded learning activities. MCQs from both models adhered to core psychometric principles. DeepSeek avoided testwiseness cues more consistently and offered better stratification of difficulty and realism, especially at the master's level. ChatGPT demonstrated strong pharmacological accuracy but occasionally showed testwiseness cues and illogical distractor sequencing. In CBL and OSPE outputs, DeepSeek showed stronger alignment with instructional and assessment criteria through modular formatting, diverse patient representation, and integration of formative tools. ChatGPT's cases and OSPEs were realistic and engaging but more narrative and occasionally less standardized. CONCLUSION: While both LLMs demonstrated educational utility, DeepSeek produced more rubric-aligned, contextually rich, and assessment-ready content across all learner stages. This study supports the integration of advanced LLMs like DeepSeek and ChatGPT in curriculum design, provided there is oversight to ensure alignment with pedagogical goals and learner needs.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.687 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.591 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.114 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.867 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.