Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Leveraging Large Language Models for Accurate AO Fracture Classification from CT Text Reports
0
Zitationen
9
Autoren
2025
Jahr
Abstract
Large language models (LLMs) have shown promising potential in analyzing complex textual data, including radiological reports. These models can assist clinicians, particularly those with limited experience, by integrating and presenting diagnostic criteria within radiological classifications. However, before clinical adoption, LLMs must be rigorously validated by medical professionals to ensure accuracy, especially in the context of advanced radiological classification systems. This study evaluates the performance of four LLMs-ChatGPT-4o, AmbossGPT, Claude 3.5 Sonnet, and Gemini 2.0 Flash-in classifying fractures based on the AO classification system using CT reports. A dataset of 292 fictitious physician-generated CT reports, representing 310 fractures, was used to assess the accuracy of each LLM in AO fracture classification retrospectively. Performance was evaluated by comparing the models' classifications to ground truth labels, with accuracy rates analyzed across different fracture types and subtypes. ChatGPT-4o and AmbossGPT achieved the highest overall accuracy (74.6 and 74.3%, respectively), outperforming Claude 3.5 Sonnet (69.5%) and Gemini 2.0 Flash (62.7%). Statistically significant differences were observed in fracture type classification, particularly between ChatGPT-4o and Gemini 2.0 Flash (Δ12%, p < 0.001). While all models demonstrated strong bone recognition rates (90-99%), their accuracy in fracture subtype classification remained lower (71-77%), indicating limitations in nuanced diagnostic categorization. LLMs show potential in assisting radiologists with initial fracture classification, particularly in high-volume or resource-limited settings. However, their performance remains inconsistent for detailed subtype classification, highlighting the need for further refinement and validation before clinical integration in advanced diagnostic workflows.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.287 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.140 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.534 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.450 Zit.