Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating and leveraging large language models in clinical pharmacology and therapeutics assessment: From exam takers to exam shapers

2025·6 Zitationen·British Journal of Clinical PharmacologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

AIMS: In medical education, the ability of large language models (LLMs) to match human performance raises questions about their potential as educational tools. This study evaluates LLMs' performance on Clinical Pharmacology and Therapeutics (CPT) exams, comparing their results to medical students and exploring their ability to identify poorly formulated multiple-choice questions (MCQs). METHODS: ). The exams included MCQs and open-ended questions assessing knowledge and prescribing skills. LLM results were analysed using the same scoring system as students. A confusion matrix was used to evaluate the ability of ChatGPT and Gemini to identify ambiguous/erroneous MCQs. RESULTS: were genuine (100%), whereas local exam errors were frequently due to ambiguities or correction flaws (24.3%). When both ChatGPT and Gemini provided the same incorrect answer to an MCQ, the specificity for detecting ambiguous questions was 92.9%, with a negative predictive value of 85.5%. CONCLUSION: LLMs demonstrate capabilities comparable to or exceeding medical students in CPT exams. Their ability to flag potentially flawed MCQs highlights their value not only as educational tools but also as quality control instruments in exam preparation.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAcademic integrity and plagiarismInnovations in Medical Education

Volltext beim Verlag öffnen

Evaluating and leveraging large language models in clinical pharmacology and therapeutics assessment: From exam takers to exam shapers

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen