Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Reasoning Models for Text Mining in Oncology: A Comparison Between o1 Preview, GPT-4o, and GPT-5 at Different Reasoning Levels
1
Zitationen
6
Autoren
2025
Jahr
Abstract
PURPOSE: Chain-of-thought prompting is a method to make large language models generate intermediate reasoning steps when solving a complex problem. OpenAI's o1 preview and GPT-5 have been trained to create such a chain of thought internally before giving a response and have been claimed to surpass various benchmarks requiring complex reasoning. The purpose of this study was to evaluate their performance in text mining in oncology. METHODS: Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. GPT-4o, o1 preview, and GPT-5 at different reasoning effort settings were instructed to do the same classification based on the publications' abstracts. RESULTS: For predicting whether patients with localized disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.80 (0.76-0.83) and 0.91 (0.89-0.94), respectively. For predicting whether patients with metastatic disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.97 (0.95-0.98) and 0.99 (0.99-1.00), respectively. For GPT-5, the F1 scores for predicting the eligibility of patients with localized disease increased from 0.84 to 0.93 and 0.94 with increased reasoning effort. F1 scores for metastatic disease were 0.97, 0.99, and 0.99. CONCLUSION: o1 preview outperformed GPT-4o in extracting if people with localized and/or metastatic disease were eligible for a trial from its abstract. GPT-5 at high reasoning effort settings outperformed both GPT-4o and o1 preview, supporting the notion that reasoning models could become the new standard for text mining in medicine.
Ähnliche Arbeiten
Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support
2008 · 50.615 Zit.
Gene Ontology: tool for the unification of biology
2000 · 44.233 Zit.
STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
2018 · 18.962 Zit.
Haploview: analysis and visualization of LD and haplotype maps
2004 · 14.680 Zit.
A translation approach to portable ontology specifications
1993 · 12.487 Zit.