Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The Performance of Large Language Models in Bone Tumour Imaging: Comparative Analysis with Radiologists Using Text and Image-based Evaluation
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Large language models (LLMs) are emerging as transformative tools in radiology, with potential to enhance diagnostic workflows. However, their performance in bone tumour imaging – a domain requiring both knowledge-based reasoning and visual interpretation -- remains unclear. This study compares the diagnostic performance of LLMs with radiologists across text and image-based tasks. In this cross-sectional study, two LLMs and two radiologists (a junior and a senior) were evaluated using fifty text-based multiple-choice questions (MCQs) and fifty radiographs with clinical vignettes from a public dataset. Participants classified lesions as benign or malignant, identified “don't-touch” lesions, and provided the most likely diagnosis. Responses were benchmarked against a reference standard using McNemar's tests. In MCQs, ChatGPT-5 (92.0%) and Gemini 2.5 Pro (90.0%) achieved accuracies comparable to SR (88.0%) and JR (84.0%) (p > 0.05). For benign--malignant classification, LLMs (50.0%, 48.0%) were similar to JR (66.0%) but inferior to SR (94.0%) (p < 0.05). In identifying “don't-touch”' lesions, LLMs (46.0%) matched JR (64.0%) yet underperformed compared to SR (92.0%) (p < 0.05). For specific diagnosis, LLMs showed low accuracy (38.0%, 30.0%) versus JR (60.0%) and SR (86.0%) (p < 0.01). LLMs may serve as useful adjuncts for clinicians and radiologists in text-based tasks and in distinguishing between benign and malignant bone tumours. However, their diagnostic accuracy remains limited.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.250 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.109 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.482 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.434 Zit.