Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
SciTrust: Evaluating the Trustworthiness of Large Language Models for Science
1
Zitationen
3
Autoren
2024
Jahr
Abstract
This work presents SciTrust, a comprehensive framework for assessing the trustworthiness of large language models (LLMs) in scientific contexts, with a focus on truthfulness, accuracy, hallucination, and sycophancy. The framework introduces four novel open-ended benchmarks in Computer Science, Chemistry, Biology, and Physics, and employs a multi-faceted evaluation approach combining traditional metrics with LLMbased evaluation. SciTrust was applied to five LLMs, including one general-purpose and four scientific models, revealing nuanced strengths and weaknesses across different models and benchmarks. The study also evaluated SciTrust’s performance and scalability on high-performance computing systems. Results showed varying performance across models, with Llama3-70B-Instruct performing strongly overall, while Galactica-120B and SciGLM-6B excelled among scientific models. SciTrust aims to advance the development of trustworthy AI in scientific applications and establish a foundation for future research on model robustness, safety, and ethics in scientific contexts. We have open-sourced our framework, including all associated scripts and datasets, at https://github.com/herronej/SciTrust.
Ähnliche Arbeiten
UCSF Chimera—A visualization system for exploratory research and analysis
2004 · 47.038 Zit.
SciPy 1.0: fundamental algorithms for scientific computing in Python
2020 · 35.700 Zit.
Clustal W and Clustal X version 2.0
2007 · 28.874 Zit.
The REDCap consortium: Building an international community of software platform partners
2019 · 22.727 Zit.
Array programming with NumPy
2020 · 20.720 Zit.