OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.03.2026, 05:12

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Multi-center benchmarking of large language models for clinical decision support in lung cancer screening

2025·0 Zitationen·Cell Reports MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

15

Autoren

2025

Jahr

Abstract

Large language models (LLMs) are increasingly explored for clinical applications, but their ability to generate management recommendations for lung cancer screening remains uncertain. In this cross-sectional, multi-center study, 148 anonymized low-dose computed tomography (CT) reports from three healthcare institutions are used to assess the readability, accuracy, and consistency of four widely adopted models (GPT-3.5, GPT-4, Claude 3 Sonnet, and Claude 3 Opus). Among them, Claude 3 Opus produces the most readable recommendations, while GPT-4 achieves the highest clinical accuracy. Importantly, performance dose not differ significantly across institutions, underscoring the robustness of these models to variations in reporting templates and their utility in diverse healthcare settings. In an exploratory analysis, two state-of-the-art models, proprietary GPT-4o and its open-source counterpart DeepSeek-R1, show comparable performance to GPT-4, outperforming GPT-3.5. These findings highlight the potential role of LLMs to enhance clinical decision support in lung cancer screening across diverse healthcare settings.

Ähnliche Arbeiten