OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 30.04.2026, 20:57

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Reasoning Models for Text Mining in Oncology: A Comparison Between o1 Preview, GPT-4o, and GPT-5 at Different Reasoning Levels

2025·1 Zitationen·JCO Clinical Cancer Informatics
Volltext beim Verlag öffnen

1

Zitationen

6

Autoren

2025

Jahr

Abstract

PURPOSE: Chain-of-thought prompting is a method to make large language models generate intermediate reasoning steps when solving a complex problem. OpenAI's o1 preview and GPT-5 have been trained to create such a chain of thought internally before giving a response and have been claimed to surpass various benchmarks requiring complex reasoning. The purpose of this study was to evaluate their performance in text mining in oncology. METHODS: Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. GPT-4o, o1 preview, and GPT-5 at different reasoning effort settings were instructed to do the same classification based on the publications' abstracts. RESULTS: For predicting whether patients with localized disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.80 (0.76-0.83) and 0.91 (0.89-0.94), respectively. For predicting whether patients with metastatic disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.97 (0.95-0.98) and 0.99 (0.99-1.00), respectively. For GPT-5, the F1 scores for predicting the eligibility of patients with localized disease increased from 0.84 to 0.93 and 0.94 with increased reasoning effort. F1 scores for metastatic disease were 0.97, 0.99, and 0.99. CONCLUSION: o1 preview outperformed GPT-4o in extracting if people with localized and/or metastatic disease were eligible for a trial from its abstract. GPT-5 at high reasoning effort settings outperformed both GPT-4o and o1 preview, supporting the notion that reasoning models could become the new standard for text mining in medicine.

Ähnliche Arbeiten