Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

AI achieves board-level performance on the Japan diagnostic radiology board examination through direct image interpretation

2026·0 Zitationen·Japanese Journal of RadiologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

To evaluate text-only versus vision-enabled performance of late-2025 large language models (LLMs) on the Japan Diagnostic Radiology Board Examination (JDRBE) and compare model performance with newly board-certified radiologists. Image-based questions from the JDRBE 2021 and 2023–2025 were collected, and ground truth answers were determined by expert consensus. Four commercial multimodal LLMs were evaluated: Gemini 2.5 Pro (March 2025, baseline), Gemini 3 Pro, GPT-5.1, and Claude Opus 4.5 (all released in November 2025). Each question was answered with image input (“vision”) and without images (“text-only”). For the JDRBE 2025, subjective legitimacy of responses was independently rated by two radiologists using a five-point Likert scale, and low-rated responses were further analyzed by error type. Additional analyses on the JDRBE 2025 subset included image-shuffling and multi-run variability assessment (five runs). Model accuracies were also compared with those of five newly board-certified radiologists who passed the JDRBE 2025. Gemini 3 Pro achieved the highest accuracy among all models, scoring 85.3% (279/327) in the vision condition and significantly outperforming its text-only accuracy (74.3%, P < 0.001). Gemini 2.5 Pro and Claude Opus 4.5 also improved with image input, whereas GPT-5.1 did not. For the JDRBE 2025, Gemini 3 Pro in the vision condition received the highest legitimacy ratings, and its accuracy (88%) was above the range observed in a reference group of five newly board-certified radiologists (65%–83%), but hallucination was still the most common error type. Image-shuffling analysis using the 2025 subset showed no performance gain in all models, supporting reliance on visual input. Multi-run variability analysis showed high agreement across runs. Among late-2025 commercial LLMs, Gemini 3 Pro demonstrated board-level performance on the JDRBE through direct medical image interpretation. The performance of vision-enabled large language models on the Japan Diagnostic Radiology Board Examination was evaluated. Among the models released in November 2025, Gemini 3 Pro demonstrated significant capabilities in direct medical image interpretation, achieving accuracy above that of a reference group of five newly board-certified radiologists.

Autoren

Institutionen

Themen

Radiology practices and educationArtificial Intelligence in Healthcare and EducationCOVID-19 diagnosis using AI

Volltext beim Verlag öffnen

AI achieves board-level performance on the Japan diagnostic radiology board examination through direct image interpretation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen