OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 31.03.2026, 05:33

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Conceptual proposal for LLM-generated FDG PET/CT follow-up reports in melanoma: a pilot study on model stability and blinded expert evaluation

2026·0 Zitationen·Frontiers in Nuclear MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

14

Autoren

2026

Jahr

Abstract

Purpose Oncological patients regularly undergo PET/CT re-staging, which requires a report that outlines their current disease status and highlights relevant changes compared to the previous PET/CT. Large language models (LLMs) may be helpful with documentation in the future. This study is a pilot on LLM performance, focusing on test–retest stability and reproducibility. Methods Three textbook melanoma follow-up cases of increasing complexity (involving one to eight organs) were selected. From standardized text-only prompts (no imaging data), follow-up reports were written by GPT-4o, Claude Sonnet 4 (each producing three independent revisions), and three nuclear medicine residents. This yielded nine reports per case (27 in total). Six blinded nuclear medicine experts (three internal, three external) performed test–retest evaluations of report quality and authorship identification. Results The cosine similarity analysis revealed high intra-case coherence (mean: 0.599–0.727) regardless of authorship. The external human readers consistently rated reports higher than the internal human readers. The LLM-generated reports received comparable or superior ratings to human reports, with Claude achieving the highest external reader scores (mean 0.926, standard deviation 0.263, on a 0–1 scale). Human performance declined with case complexity, while Claude, in particular, improved. The external readers significantly preferred the LLM impressions (Fisher’s exact test, p = 0.005). Neither the human nor LLM readers reliably identified authorship (balanced accuracy 0.343–0.500). Conclusion In this pilot, blinded expert evaluation demonstrated that current LLMs can generate reports for melanoma [ 18 F]fluorodeoxyglucose PET/CT of comparable quality to human-authored reports from text prompts in this study. High test–retest stability was obtained. Larger future studies will be required to confirm these findings.

Ähnliche Arbeiten