Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
AI achieves board-level performance on the Japan diagnostic radiology board examination through direct image interpretation
0
Zitationen
13
Autoren
2026
Jahr
Abstract
To evaluate text-only versus vision-enabled performance of late-2025 large language models (LLMs) on the Japan Diagnostic Radiology Board Examination (JDRBE) and compare model performance with newly board-certified radiologists. Image-based questions from the JDRBE 2021 and 2023–2025 were collected, and ground truth answers were determined by expert consensus. Four commercial multimodal LLMs were evaluated: Gemini 2.5 Pro (March 2025, baseline), Gemini 3 Pro, GPT-5.1, and Claude Opus 4.5 (all released in November 2025). Each question was answered with image input (“vision”) and without images (“text-only”). For the JDRBE 2025, subjective legitimacy of responses was independently rated by two radiologists using a five-point Likert scale, and low-rated responses were further analyzed by error type. Additional analyses on the JDRBE 2025 subset included image-shuffling and multi-run variability assessment (five runs). Model accuracies were also compared with those of five newly board-certified radiologists who passed the JDRBE 2025. Gemini 3 Pro achieved the highest accuracy among all models, scoring 85.3% (279/327) in the vision condition and significantly outperforming its text-only accuracy (74.3%, P < 0.001). Gemini 2.5 Pro and Claude Opus 4.5 also improved with image input, whereas GPT-5.1 did not. For the JDRBE 2025, Gemini 3 Pro in the vision condition received the highest legitimacy ratings, and its accuracy (88%) was above the range observed in a reference group of five newly board-certified radiologists (65%–83%), but hallucination was still the most common error type. Image-shuffling analysis using the 2025 subset showed no performance gain in all models, supporting reliance on visual input. Multi-run variability analysis showed high agreement across runs. Among late-2025 commercial LLMs, Gemini 3 Pro demonstrated board-level performance on the JDRBE through direct medical image interpretation. The performance of vision-enabled large language models on the Japan Diagnostic Radiology Board Examination was evaluated. Among the models released in November 2025, Gemini 3 Pro demonstrated significant capabilities in direct medical image interpretation, achieving accuracy above that of a reference group of five newly board-certified radiologists.
Ähnliche Arbeiten
Refinement and reassessment of the SERVQUAL scale.
1991 · 3.967 Zit.
Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review
2005 · 3.781 Zit.
Radiobiology for the Radiologist.
1974 · 3.502 Zit.
International evidence-based recommendations for point-of-care lung ultrasound
2012 · 2.818 Zit.
Radiation Dose Associated With Common Computed Tomography Examinations and the Associated Lifetime Attributable Risk of Cancer
2009 · 2.431 Zit.