OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 20:57

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the Use of Large Language Models to Answer Patient-Facing Clinical Trial Questions

2025·0 Zitationen·JCO oncology advances.Open Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2025

Jahr

Abstract

PURPOSE The complexity of clinical trial documentation is a significant barrier to patient understanding and informed consent. Although large language models (LLMs) show promise for simplifying this information, their tendency to generate incorrect information (hallucinate) poses serious safety risks. We aimed to systematically compare the performance of a leading proprietary LLM and a widely used open-source LLM in answering authentic patient questions about clinical trials. METHODS We curated 349 unique patient queries from 23 authoritative oncology and regulatory Web sites. A representative subset of these questions was posed to a high-performance proprietary model (GPT-4o) and a popular open-source model (Llama-3.2-8B). Two physicians, blinded to the model's identity, independently evaluated the paired responses to assess accuracy, clarity, safety, and other quality dimensions. RESULTS A total of 374 responses were evaluated. GPT-4o demonstrated superior reliability, with no instances of information fabrication (0 of 188 responses). In contrast, Llama-3.2-8B produced fabricated claims in 14.5% of its responses (27 of 186), including a significant error regarding the ethics of placebo controls. GPT-4o also outperformed Llama-3.2-8B across other key domains, including clarity, usefulness, and self-awareness in acknowledging uncertainty. CONCLUSION The proprietary model, GPT-4o, was significantly more reliable and safer than the open-source Llama-3.2-8B for answering patient questions about clinical trials. These findings highlight the critical need for rigorous, comparative evaluation of LLMs before their deployment in patient-facing applications. To ensure patient safety, health care systems should pair high-performing models with structured safety guardrails and continuous monitoring.

Ähnliche Arbeiten