Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Comprehensive Evaluation of the Performance of Large Language Models on the Japanese National Examination for Radiological Technologists
0
Zitationen
5
Autoren
2026
Jahr
Abstract
PURPOSE: This study aimed to evaluate the performance of several large language models (LLMs) on the Japanese National Examination for Radiological Technologists and to characterize their performance profiles. METHODS: We utilized a dataset comprising questions from 12 consecutive years of the national examination (the 65th to the 76th iterations), excluding items that were officially retracted or deemed inappropriate. 5 distinct LLMs (ChatGPT-3.5, Gemini 2.5 Flash, Gemini 2.5 Pro, Copilot, and Claude Sonnet 4) were prompted to answer these questions. The accuracy of each LLM was calculated for the entire question set and for subsets categorized by question format. RESULTS: Across the entire examination and within numerous subject areas, Gemini 2.5 Pro achieved the highest accuracy. An analysis by question format revealed a general trend: most LLMs demonstrated superior performance on text-based questions, followed by calculation-based and then image-based questions. However, some models exhibited notably strong performance specifically on calculation-based problems. CONCLUSION: While LLMs demonstrate considerable proficiency in answering questions from the National Examination for Radiological Technologists, our findings also reveal significant limitations, particularly in their capacity to interpret image-based problems. This study highlights both the potential utility and the current challenges of leveraging LLMs as supplementary learning tools for this professional certification examination.
Ähnliche Arbeiten
Refinement and reassessment of the SERVQUAL scale.
1991 · 3.967 Zit.
Radiobiology for the Radiologist.
1974 · 3.502 Zit.
ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee
2017 · 2.431 Zit.
Accuracy of Physician Self-assessment Compared With Observed Measures of Competence
2006 · 2.325 Zit.
Technology as an Occasion for Structuring: Evidence from Observations of CT Scanners and the Social Order of Radiology Departments
1986 · 2.249 Zit.