Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating Google Gemini’s Capability to Generate NBME-Standard Pharmacology Questions Using a 16-Criterion NBME Rubric
0
Zitationen
3
Autoren
2025
Jahr
Abstract
Background: Large language models (LLMs) such as Google Gemini have demonstrated strong capabilities in natural language generation, but their ability to create medical assessment items aligned with National Board of Medical Examiners (NBME) standards remains underexplored. Objective: This study evaluated the quality of Gemini-generated NBME-style pharmacology questions using a structured rubric to assess accuracy, clarity, and alignment with examination standards. Methods: Ten pharmacology questions were generated using a standardized prompt and assessed independently by two pharmacology experts. Each item was evaluated using a 16-criterion NBME rubric with binary scoring. Inter-rater reliability was calculated (Cohen’s Kappa = 0.81) following a calibration session. Results: On average, questions met 14.3 of 16 criteria. Strengths included logical structure, appropriate distractors, and clinically relevant framing. Limitations included occasional pseudo-vignettes, cueing issues, and one instance of factual inaccuracy (albuterol mechanism of action). The evaluation highlighted Gemini’s ability to produce high-quality NBME-style questions, while underscoring concerns regarding sample size, reproducibility, and factual reliability. Conclusions: Gemini shows promise as a tool for generating pharmacology assessment items, but its probabilistic outputs, factual inaccuracies, and limited scope necessitate caution. Larger-scale studies, inclusion of multiple medical disciplines, incorporation of student performance data, and use of broader expert panels are recommended to establish reliability and educational applicability.
Ähnliche Arbeiten
International Journal of Scientific and Research Publications
2022 · 2.691 Zit.
Student writing in higher education: An academic literacies approach
1998 · 2.495 Zit.
Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling
2012 · 2.309 Zit.
How Many Scientists Fabricate and Falsify Research? A Systematic Review and Meta-Analysis of Survey Data
2009 · 1.921 Zit.
Chatting and cheating: Ensuring academic integrity in the era of ChatGPT
2023 · 1.789 Zit.