Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Benchmarking large language models for predictive modeling in biomedical research with a focus on reproductive health
2
Zitationen
12
Autoren
2026
Jahr
Abstract
Large language models (LLMs) are increasingly used for code generation and data analysis. This study assesses LLM performance across four predictive tasks from three DREAM challenges: gestational age regression from transcriptomics and DNA methylation and classification of preterm birth and early preterm birth from microbiome data. We prompt LLMs with task descriptions, data locations, and target outcomes and then run LLM-generated code to fit prediction models and determine accuracy on test sets. Among the eight LLMs tested, o3-mini-high, 4o, DeepseekR1, and Gemini 2.0 can complete at least one task. R code generation is more successful (14/16) than Python (7/16). OpenAI's o3-mini-high outperforms others, completing 7/8 tasks. Test set performance of the top LLM-generated models matches or exceeds the median-participating team for all four tasks and surpasses the top-performing team for one task (p = 0.02). These findings underscore the potential of LLMs to democratize predictive modeling in omics and increase research output.
Ähnliche Arbeiten
Epidemiology and causes of preterm birth
2008 · 7.716 Zit.
Global, regional, and national estimates of levels of preterm birth in 2014: a systematic review and modelling analysis
2018 · 3.119 Zit.
Intrauterine Infection and Preterm Delivery
2000 · 2.556 Zit.
Single-cell reconstruction of the early maternal–fetal interface in humans
2018 · 2.448 Zit.
Born Too Soon: The global epidemiology of 15 million preterm births
2013 · 2.227 Zit.