Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking large language models for predictive modeling in biomedical research with a focus on reproductive health

2026·2 Zitationen·Cell Reports MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large language models (LLMs) are increasingly used for code generation and data analysis. This study assesses LLM performance across four predictive tasks from three DREAM challenges: gestational age regression from transcriptomics and DNA methylation and classification of preterm birth and early preterm birth from microbiome data. We prompt LLMs with task descriptions, data locations, and target outcomes and then run LLM-generated code to fit prediction models and determine accuracy on test sets. Among the eight LLMs tested, o3-mini-high, 4o, DeepseekR1, and Gemini 2.0 can complete at least one task. R code generation is more successful (14/16) than Python (7/16). OpenAI's o3-mini-high outperforms others, completing 7/8 tasks. Test set performance of the top LLM-generated models matches or exceeds the median-participating team for all four tasks and surpasses the top-performing team for one task (p = 0.02). These findings underscore the potential of LLMs to democratize predictive modeling in omics and increase research output.

Benchmarking large language models for predictive modeling in biomedical research with a focus on reproductive health

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen