OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 04.05.2026, 16:39

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance comparison of large language models in boron neutron capture therapy knowledge assessment

2026·0 Zitationen·Scientific ReportsOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2026

Jahr

Abstract

Accelerator-based boron neutron capture therapy (BNCT) is a binary radiation therapy that has rapidly developed in recent years. This study systematically evaluated and compared the performance of four mainstream model families [ChatGPT, Bard (Gemini), Claude, and ERNIE Bot] in answering BNCT-related knowledge questions, providing a reference for exploring their potential in BNCT professional education. Forty-seven bilingual BNCT questions covering key concepts, clinical practice, and reasoning tasks were constructed. Four mainstream model families [ ChatGPT, Claude, Bard(Gemini), and ERNIE Bot] were tested across five rounds in two languages and question formats. The accuracy, reasoning ability, uncertainty expression, and version effects were analyzed. ChatGPT (72.8%) and Claude (70.4%) showed significantly higher overall accuracy rates than Bard(Gemini) (62.0%) and ERNIE Bot (55.6%) (p < 0.001). Both high-performance models performed significantly better on reasoning-based questions than on fact-based questions (p < 0.001). The average performance improvement from version updates (7.51 ± 8.46percentage points) was numerically higher than the changes during same-version maintenance (0.61 ± 8.68 percentage points, p = 0.126). Although language and questioning methods showed statistically significant effects, the effect sizes were minimal (η2p < 0.01). Uncertainty acknowledgment rates varied significantly among the model families (4.7%-23.7%, p = 0.003). ChatGPT can provide relatively accurate knowledge for the popularization of BNCT. However, existing general-purpose LLMs still cannot accurately answer all BNCT questions and show significant differences in uncertainty expression.

Ähnliche Arbeiten