OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 29.03.2026, 19:37

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating Google Gemini’s Capability to Generate NBME-Standard Pharmacology Questions Using a 16-Criterion NBME Rubric

2025·0 Zitationen·AlgorithmsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2025

Jahr

Abstract

Background: Large language models (LLMs) such as Google Gemini have demonstrated strong capabilities in natural language generation, but their ability to create medical assessment items aligned with National Board of Medical Examiners (NBME) standards remains underexplored. Objective: This study evaluated the quality of Gemini-generated NBME-style pharmacology questions using a structured rubric to assess accuracy, clarity, and alignment with examination standards. Methods: Ten pharmacology questions were generated using a standardized prompt and assessed independently by two pharmacology experts. Each item was evaluated using a 16-criterion NBME rubric with binary scoring. Inter-rater reliability was calculated (Cohen’s Kappa = 0.81) following a calibration session. Results: On average, questions met 14.3 of 16 criteria. Strengths included logical structure, appropriate distractors, and clinically relevant framing. Limitations included occasional pseudo-vignettes, cueing issues, and one instance of factual inaccuracy (albuterol mechanism of action). The evaluation highlighted Gemini’s ability to produce high-quality NBME-style questions, while underscoring concerns regarding sample size, reproducibility, and factual reliability. Conclusions: Gemini shows promise as a tool for generating pharmacology assessment items, but its probabilistic outputs, factual inaccuracies, and limited scope necessitate caution. Larger-scale studies, inclusion of multiple medical disciplines, incorporation of student performance data, and use of broader expert panels are recommended to establish reliability and educational applicability.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Academic integrity and plagiarismArtificial Intelligence in Healthcare and EducationBiomedical and Engineering Education
Volltext beim Verlag öffnen