Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Analysis of LLMs’ Performance On a Practice Radiography Certification Exam

2025·2 Zitationen·Radiologic technology

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

PURPOSE: To compare the performance of multiple large language models (LLMs) on a practice radiography certification exam. METHOD: Using an exploratory, nonexperimental approach, 200 multiple-choice question stems and options (correct answers and distractors) from a practice radiography certification exam were entered into 5 LLMs: ChatGPT (OpenAI), Claude (Anthropic), Copilot (Microsoft), Gemini (Google), and Perplexity (Perplexity AI). Responses were recorded as correct or incorrect, and overall accuracy rates were calculated for each LLM. McNemar tests determined if there were significant differences between accuracy rates. Performance also was evaluated and aggregated by content categories and subcategories. RESULTS: ChatGPT had the highest overall accuracy of 83.5%, followed by Perplexity (78.9%), Copilot (78.0%), Gemini (75.0%), and Claude (71.0%). ChatGPT had a significantly higher accuracy rate than did Claude (P , .001) and Gemini (P 5 .02). Regarding content categories, ChatGPT was the only LLM to correctly answer all 38 patient care questions. In addition, ChatGPT had the highest number of correct responses in the areas of safety (38/48, 79.2%) and procedures (50/59, 84.7%). Copilot had the highest number of correct responses in the area of image production (43/55, 78.2%). ChatGPT also achieved superior accuracy in 4 of the 8 subcategories. DISCUSSION: Findings from this study provide valuable insights into the performance of multiple LLMs in answering practice radiography certification exam questions. Although ChatGPT emerged as the most accurate LLM for this practice exam, caution should be exercised when using generative artificial intelligence (AI) models. Because LLMs can generate false and incorrect information, responses must be checked for accuracy, and the models should be corrected when inaccurate responses are given. CONCLUSION: Among the 5 LLMs compared in this study, ChatGPT was the most accurate model. As interest in generative AI continues to increase and new language applications become readily available, users should understand the limitations of LLMs and check responses for accuracy. Future research could include additional practice exams in other primary pathways, including magnetic resonance imaging, nuclear medicine technology, radiation therapy, and sonography.

Autoren

Kevin Clark

Themen

Artificial Intelligence in Healthcare and EducationRadiology practices and educationRadiomics and Machine Learning in Medical Imaging

Volltext beim Verlag öffnen

Comparative Analysis of LLMs’ Performance On a Practice Radiography Certification Exam

Abstract

Ähnliche Arbeiten

Autoren

Themen