OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 16.05.2026, 12:53

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Large language models as educational collaborators: developing non-conventional teaching aids in pharmacology & therapeutics

2025·1 Zitationen·BMC Medical EducationOpen Access
Volltext beim Verlag öffnen

1

Zitationen

2

Autoren

2025

Jahr

Abstract

BACKGROUND: With the growing integration of artificial intelligence in medical education, this study compares the quality and educational robustness of content generated by two large language models (LLMs), DeepSeek-V3 and ChatGPT 4.0, on the emerging, non-conventional topic (and not present in textbooks) of gender-affirming hormone therapy (GAHT) across three educational phases: preclerkship and clerkship phases in undergraduate medical curriculum, and master's level in pharmacology. METHODS: A total of 23 prompts were designed to generate Specific Learning Objectives (SLOs), reading materials, assessment items (MCQs, SAQs, and OSPEs), and case-based learning (CBL) scenarios across the three learner stages. The outputs from both LLMs were evaluated independently using rubric-based frameworks assessing content appropriateness, pedagogical structure, assessment alignment, and inclusivity. RESULTS: Both LLMs produced pedagogically sound outputs; however, DeepSeek consistently demonstrated superior adherence to rubric criteria. For SLOs, DeepSeek maintained a clear hierarchical progression across phases and showed greater precision, contextual alignment, and time-bound formulation. Its objectives were more assessable and reflective of increasing cognitive complexity. ChatGPT's SLOs were inclusive and coherent but occasionally lacked time-specificity and structural clarity. In reading materials, DeepSeek outperformed by integrating clinical relevance, scaffolded structure, and interactive learning tools across all phases. It included visual aids, case vignettes, and phase-specific assessments, while ChatGPT's content was accurate and readable but leaned toward text-heavy exposition with fewer embedded learning activities. MCQs from both models adhered to core psychometric principles. DeepSeek avoided testwiseness cues more consistently and offered better stratification of difficulty and realism, especially at the master's level. ChatGPT demonstrated strong pharmacological accuracy but occasionally showed testwiseness cues and illogical distractor sequencing. In CBL and OSPE outputs, DeepSeek showed stronger alignment with instructional and assessment criteria through modular formatting, diverse patient representation, and integration of formative tools. ChatGPT's cases and OSPEs were realistic and engaging but more narrative and occasionally less standardized. CONCLUSION: While both LLMs demonstrated educational utility, DeepSeek produced more rubric-aligned, contextually rich, and assessment-ready content across all learner stages. This study supports the integration of advanced LLMs like DeepSeek and ChatGPT in curriculum design, provided there is oversight to ensure alignment with pedagogical goals and learner needs.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationText Readability and SimplificationMachine Learning in Materials Science
Volltext beim Verlag öffnen