Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
0
Zitationen
120
Autoren
2025
Jahr
Abstract
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research, yet their progress is shaped by the complex nature of scientific data. This survey presents a comprehensive, data-centric synthesis that reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate. We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge, emphasizing the multimodal, cross-scale, and domain-specific challenges that differentiate scientific corpora from general natural language processing datasets. We systematically review recent Sci-LLMs, from general-purpose foundations to specialized models across diverse scientific disciplines, alongside an extensive analysis of over 270 pre-/post-training datasets, showing why Sci-LLMs pose distinct demands -- heterogeneous, multi-scale, uncertainty-laden corpora that require representations preserving domain invariance and enabling cross-modal reasoning. On evaluation, we examine over 190 benchmark datasets and trace a shift from static exams toward process- and discovery-oriented assessments with advanced evaluation protocols. These data-centric analyses highlight persistent issues in scientific data development and discuss emerging solutions involving semi-automated annotation pipelines and expert validation. Finally, we outline a paradigm shift toward closed-loop systems where autonomous agents based on Sci-LLMs actively experiment, validate, and contribute to a living, evolving knowledge base. Collectively, this work provides a roadmap for building trustworthy, continually evolving artificial intelligence (AI) systems that function as a true partner in accelerating scientific discovery.
Ähnliche Arbeiten
UCSF Chimera—A visualization system for exploratory research and analysis
2004 · 47.010 Zit.
AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading
2009 · 35.451 Zit.
Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
1989 · 31.298 Zit.
The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals
2007 · 29.326 Zit.
<i>VESTA 3</i> for three-dimensional visualization of crystal, volumetric and morphology data
2011 · 24.062 Zit.
Autoren
- Mingjun Hu
- Chenglong Ma
- Wei Li
- Wanghan Xu
- Jiamin Wu
- Jucheng Hu
- Tianbin Li
- Guohang Zhuang
- Jiaqi Liu
- Yingzhou Lu
- Ying Chen
- Chaoyang Zhang
- Cheng Tan
- Ying Jie
- Guo–Cheng Wu
- Shuo Gao
- Pengcheng Chen
- Jiashi Lin
- Haitao Wu
- Lulu Chen
- Fengxiang Wang
- Yuanyuan Zhang
- Xiangyu Zhao
- Feilong Tang
- Encheng Su
- Jing Ning
- Xinyao Liu
- Yu Du
- Changkai Ji
- Pengfei Jiang
- Cheng Tang
- Ziyan Huang
- Jiyao Liu
- Jiaqi Wei
- Yue-Jin Yang
- Xiang Zhang
- Guangshuai Wang
- Yang Yue
- Huihui Xu
- Ziyang Chen
- Yizhou Wang
- Chen Tang
- Jianyu Wu
- Yuchen Ren
- Siyuan Yan
- Zhonghua Wang
- Zhongxing Xu
- Shiyan Su
- Shangquan Sun
- Runkai Zhao
- Zhisheng Zhang
- Dingkang Yang
- Jinjie Wei
- Jiaqi Wang
- Jin-Sen Xu
- Jiangtao Yan
- Wenhao Tang
- Hongze Zhu
- Yu Liu
- Fudi Wang
- Yiqing Shen
- Yuanfeng Ji
- Yanzhou Su
- Tong Xie
- Hongming Shan
- Chun-Mei Feng
- Zhi Bin Hou
- Diping Song
- Lihao Liu
- Yanyan Huang
- Lequan Yu
- Bin Fu
- Shujun Wang
- X.L. Li
- Xiaowei Hu
- Yun Gu
- Ben Fei
- Benyou Wang
- Y. Cao
- Minjie Shen
- Jie Xu
- Haodong Duan
- Yan Fang
- Hongxia Hao
- Jielan Li
- Jiajun Du
- Yanbo Wang
- Imran Razzak
- Zhongying Deng
- Chi Zhang
- Lijun Wu
- Conghui He
- Lu Zhaohui
- Jinhai Huang
- Wenqi Shao
- Yihao Liu
- Siqi Luo
- Yi Xin
- Xiaohong Liu
- Fenghua Ling
- Yuqiang Li
- Aoran Wang
- Shanchao Sun
- Qiusheng Zheng
- Nanqing Dong
- Tianfan Fu
- Dongzhan Zhou
- Yan Lü
- Wenlong Zhang
- Ye Jin
- Jun Cai
- Yirong Chen
- Wanli Ouyang
- Yu Qiao
- Zongyuan Ge
- Shixiang Tang
- Junjun He
- Chunfeng Song
- Lei Bai
- Bowen Zhou