Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Quantitative Analysis of GPT-4 model: Optimizing Patient Eligibility Classification for Clinical Trials and Reducing Expert Judgment Dependency
3
Zitationen
9
Autoren
2024
Jahr
Abstract
Objective: Generative Pre-trained Transformer 4 (GPT-4) is a large multimodal language model created by OpenAI and the fourth in its series of GPT foundation models. Although GPT-4 has been utilized in several applications, its abilities are less known for patient categorization based on their eligibility for clinical trials. The primary objective of this work is to evaluate the accuracy and efficacy of GPT-4 for patient eligibility evaluation. Data: Ten US NSCLC drug-only interventional clinical trials were selected from clinicaltrials.gov. Ten patient profiles were manually created for each clinical trial using case presentations published in peer-reviewed medical journals by clinicians/epidemiologists. The dataset included two sets of adult patient profiles (50 eligible patients and 50 non-eligible patients for 100 patients) with a range of complexities, from complex to simple cases. The 100-case dataset was then analyzed, comparing the accuracy of the large language model—GPT-4, against the human expert's evaluation. Analysis: Various data tuning scenarios (80% and 0%) were evaluated, explicitly examining the model's capacity to mimic the performance of human experts in classifying patient eligibility. Model evaluations were compared to human evaluations to ensure reliable accuracy results.To measure efficacy, age analysis, gender analysis, and sensitivity and specificity analyses were conducted, providing a comprehensive examination of the model's performance across various dimensions. Results: GPT-4 showed 100% test accuracy in scenarios involving tuning in 80% of cases and 95% test accuracy without tuning in 80% of cases. GPT-4 showed 86% test accuracy on all cases with 0% tuning. Furthermore, the GPT-4 model was compared to human patient categorization based on patient eligibility for clinical trials. The bias shown by GPT-4 was not significantly different from the bias shown by the human expert's evaluation both based on gender and age. Conclusion: The GPT-4 model has demonstrated a high level of accuracy and an unbiased approach to patient classification compared to human experts. However, further research with more extensive and diverse datasets is recommended to confirm these findings. Other LLMs may also be tested in clinical trial settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.239 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.095 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.463 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.428 Zit.