OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 20:49

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Quantitative Analysis of GPT-4 model: Optimizing Patient Eligibility Classification for Clinical Trials and Reducing Expert Judgment Dependency

2024·3 Zitationen
Volltext beim Verlag öffnen

3

Zitationen

9

Autoren

2024

Jahr

Abstract

Objective: Generative Pre-trained Transformer 4 (GPT-4) is a large multimodal language model created by OpenAI and the fourth in its series of GPT foundation models. Although GPT-4 has been utilized in several applications, its abilities are less known for patient categorization based on their eligibility for clinical trials. The primary objective of this work is to evaluate the accuracy and efficacy of GPT-4 for patient eligibility evaluation. Data: Ten US NSCLC drug-only interventional clinical trials were selected from clinicaltrials.gov. Ten patient profiles were manually created for each clinical trial using case presentations published in peer-reviewed medical journals by clinicians/epidemiologists. The dataset included two sets of adult patient profiles (50 eligible patients and 50 non-eligible patients for 100 patients) with a range of complexities, from complex to simple cases. The 100-case dataset was then analyzed, comparing the accuracy of the large language model—GPT-4, against the human expert's evaluation. Analysis: Various data tuning scenarios (80% and 0%) were evaluated, explicitly examining the model's capacity to mimic the performance of human experts in classifying patient eligibility. Model evaluations were compared to human evaluations to ensure reliable accuracy results.To measure efficacy, age analysis, gender analysis, and sensitivity and specificity analyses were conducted, providing a comprehensive examination of the model's performance across various dimensions. Results: GPT-4 showed 100% test accuracy in scenarios involving tuning in 80% of cases and 95% test accuracy without tuning in 80% of cases. GPT-4 showed 86% test accuracy on all cases with 0% tuning. Furthermore, the GPT-4 model was compared to human patient categorization based on patient eligibility for clinical trials. The bias shown by GPT-4 was not significantly different from the bias shown by the human expert's evaluation both based on gender and age. Conclusion: The GPT-4 model has demonstrated a high level of accuracy and an unbiased approach to patient classification compared to human experts. However, further research with more extensive and diverse datasets is recommended to confirm these findings. Other LLMs may also be tested in clinical trial settings.

Ähnliche Arbeiten