OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.03.2026, 16:22

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT augmented clinical trial screening

2025·1 Zitationen·Machine Learning HealthOpen Access
Volltext beim Verlag öffnen

1

Zitationen

15

Autoren

2025

Jahr

Abstract

Manual clinical trial screening is costly and inefficient. Structured EHR data often fail to adequately capture trial criteria, but large language models (LLMs) like GPT-3.5 and GPT-4 offer a potential solution by analyzing unstructured EHR text. Optimal deployment strategies remain unclear. We evaluated GPT-3.5 and GPT-4 in screening 74 patients (35 eligible, 39 ineligible) for a phase 2 head and neck cancer trial. EHR data included progress notes, pathology, and imaging reports. Fourteen trial criteria (e.g. stage, histology, prior treatments) were manually annotated as the ground truth. Three prompting methods were tested: structured output (SO), chain of thought (CoT), and self-discover (SD), with further optimization by expert (EG) or LLM guidance (LLM-G). Prompts were refined on 20 patients and validated on 54. Performance metrics included accuracy, sensitivity, specificity, and micro F 1 score. Strict eligibility required all criteria to be met, while proportional eligibility flagged patients based on criteria proportions. Screening time and cost were also analyzed. GPT-3.5 achieved a median accuracy of 0.761, with SO + EG performing best. GPT-4 had higher accuracy (0.838), with SD achieving the highest Youden index (0.729). For strict eligibility, GPT-4’s CoT + EG approach had the highest accuracy (0.65). Proportional eligibility performed better overall, with GPT-4’s CoT + LLM-G achieving the highest AUC (0.82). Screening time per patient ranged from 1.4–12.4 min, and costs ranged from $0.02–$0.27. LLMs can effectively screen for trial criteria but struggle to identify patients meeting all criteria. Proportional eligibility offers a practical alternative. GPT-4 outperformed GPT-3.5 but at higher costs and longer times. LLMs should supplement manual chart reviews for trial screening.

Ähnliche Arbeiten