Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Close, But no Cigar: Comparative Evaluation of ChatGPT-4o and OpenAI o1-preview in Answering Pancreatic Ductal Adenocarcinoma-Related Questions

2025·1 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract Background This study aimed to evaluate the effectiveness of ChatGPT-4o and OpenAI o1-preview in responding to pancreatic ductal adenocarcinoma (PDAC)-related queries. The study assessed both LLMs’ accuracy, comprehensiveness, and safety when answering clinical questions, based on the National Comprehensive Cancer Network® (NCCN) Clinical Practice Guidelines for PDAC. Methods The study used a 20-question dataset derived from clinical scenarios related to PDAC. Two board-certified surgeons independently evaluated the responses by ChatGPT-4o and OpenAI o1-preview for their accuracy, comprehensiveness, and safety using a Likert scale. Statistical analyses were conducted to compare the performances of the two models. We also analyzed the impact of OpenAI o1-preview’s Chain of Thought (CoT) technology. Results Both models demonstrated high median scores across all dimensions (5 out of 5). OpenAI o1-preview outperformed ChatGPT-4o in comprehensiveness (p = 0.026) and demonstrated superior reasoning ability, with a higher accuracy rate of 75% compared to 60% for ChatGPT-4o. OpenAI o1-preview generated more concise responses (median 64 vs. 82 words, p < 0.001). The CoT method in OpenAI o1-preview appeared to enhance its reasoning capabilities, particularly in complex treatment decisions. However, both models made critical errors in some complex clinical scenarios. Conclusion OpenAI o1-preview, with its CoT technology, demonstrates higher comprehensiveness than ChatGPT-4.0 and showed a tendency of improved accuracy. However, both models still make critical errors and cause some harm to patients. Even the most advanced models are not suitable for offering reliable medical information and cannot function as an assistant for decision-making.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationPancreatic and Hepatic Oncology ResearchRadiomics and Machine Learning in Medical Imaging

Volltext beim Verlag öffnen

Close, But no Cigar: Comparative Evaluation of ChatGPT-4o and OpenAI o1-preview in Answering Pancreatic Ductal Adenocarcinoma-Related Questions

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen