Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Enhancing Diagnostic Precision: Utilising a Large Language Model to Extract U Scores from Thyroid Sonography Reports

2025·1 Zitationen·Studies in health technology and informaticsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This study evaluates the performance of ChatGPT-4, a Large Language Model (LLM), in automatically extracting U scores from free-text thyroid ultrasound reports collected from University Hospitals Birmingham (UHB), UK, between 2014 and 2024. The LLM was provided with guidelines on the U classification system and extracted U scores independently from 14,248 de-identified reports, without access to human-assigned scores. The LLM-extracted scores were compared to initial clinician-assigned and refined U scores provided by expert reviewers. The LLM achieved 97.7% agreement with refined human U scores, successfully identifying the highest U score in 98.1% of reports with multiple nodules. Most discrepancies (2.5%) were linked to ambiguous descriptions, multi-nodule reports, and cases with human-documented uncertainty. While the results demonstrate the potential for LLMs to improve reporting consistency and reduce manual workload, ethical and governance challenges such as transparency, privacy, and bias must be addressed before routine clinical deployment. Embedding LLMs into reporting workflows, such as Online Analytical Processing (OLAP) tools, could further enhance reporting quality and consistency.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMeta-analysis and systematic reviewsReliability and Agreement in Measurement

Volltext beim Verlag öffnen

Enhancing Diagnostic Precision: Utilising a Large Language Model to Extract U Scores from Thyroid Sonography Reports

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen