OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 18.03.2026, 22:10

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Performance of Large Language Models in Bone Tumour Imaging: Comparative Analysis with Radiologists Using Text and Image-based Evaluation

2026·0 Zitationen·Proceedings of the Bulgarian Academy of SciencesOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2026

Jahr

Abstract

Large language models (LLMs) are emerging as transformative tools in radiology, with potential to enhance diagnostic workflows. However, their performance in bone tumour imaging – a domain requiring both knowledge-based reasoning and visual interpretation -- remains unclear. This study compares the diagnostic performance of LLMs with radiologists across text and image-based tasks. In this cross-sectional study, two LLMs and two radiologists (a junior and a senior) were evaluated using fifty text-based multiple-choice questions (MCQs) and fifty radiographs with clinical vignettes from a public dataset. Participants classified lesions as benign or malignant, identified “don't-touch” lesions, and provided the most likely diagnosis. Responses were benchmarked against a reference standard using McNemar's tests. In MCQs, ChatGPT-5 (92.0%) and Gemini 2.5 Pro (90.0%) achieved accuracies comparable to SR (88.0%) and JR (84.0%) (p > 0.05). For benign--malignant classification, LLMs (50.0%, 48.0%) were similar to JR (66.0%) but inferior to SR (94.0%) (p < 0.05). In identifying “don't-touch”' lesions, LLMs (46.0%) matched JR (64.0%) yet underperformed compared to SR (92.0%) (p < 0.05). For specific diagnosis, LLMs showed low accuracy (38.0%, 30.0%) versus JR (60.0%) and SR (86.0%) (p < 0.01). LLMs may serve as useful adjuncts for clinicians and radiologists in text-based tasks and in distinguishing between benign and malignant bone tumours. However, their diagnostic accuracy remains limited.

Ähnliche Arbeiten