Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the capability of large language models in radiotherapy through professional certification examinations in Japan
0
Zitationen
13
Autoren
2025
Jahr
Abstract
Large language models (LLMs), such as ChatGPT and Grok, have rapidly advanced in natural language understanding and are increasingly being applied to specialized fields, including medicine. In this study, we evaluated the domain-specific knowledge of LLMs in radiotherapy by assessing their performance on three certification examinations in Japan: the Japanese Medical Physicist Examination, the Japanese Board Examination for Radiologists and the Japanese Board Examination for Radiation Oncologists. We assessed five LLMs-ChatGPT-5, ChatGPT-5 Pro, Grok 4, Grok 4 heavy and Gemini 2.5 Pro-by inputting all multiple-choice questions from these exams into each model and recording their responses. The AI-generated answers were compared with reference answers determined by experienced medical physicists and radiation oncologists. The results demonstrated average accuracies of 84.7 ± 2.0% (ChatGPT-5), 94.7 ± 2.1% (ChatGPT-5 Pro), 78.4 ± 1.2% (Grok 4), 81.6 ± 2.2% (Grok 4 heavy) and 88.9 ± 1.2% (Gemini 2.5 Pro). All models achieved over 75% accuracy, with ChatGPT-5 Pro consistently outperforming others, attaining an average accuracy exceeding 90% across all examinations. These findings highlight the strong potential of advanced LLMs, particularly ChatGPT-5 Pro, for future integration into radiotherapy-related applications such as automated contouring and treatment planning support.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.436 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.311 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.753 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.523 Zit.