Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A dataset for evaluating clinical research claims in large language models
0
Zitationen
8
Autoren
2024
Jahr
Abstract
Abstract Large language models (LLMs) have the potential to enhance the verification of health claims. However, issues with hallucination and comprehension of logical statements require these models to be closely scrutinized in healthcare applications. We introduce CliniFact, a scientific claim dataset created from hypothesis testing results in clinical research, covering 992 unique interventions for 22 disease categories. The dataset used study arms and interventions, primary outcome measures, and results from clinical trials to derive and label clinical research claims. These claims were then linked to supporting information describing clinical trial results in scientific publications. CliniFact contains 1,970 scientific claims from 992 unique clinical trials related to 1,540 unique publications. Intrinsic evaluation yields a Cohen’s Kappa score of 0.83, indicating strong inter-annotator agreement. In extrinsic evaluations, discriminative LLMs, such as PubMedBERT, achieved 81% accuracy and 79% F1-score, outperforming generative LLMs, such as Llama3-70B, which reached 52% accuracy and 39% F1-score. Our results demonstrate the potential of CliniFact as a benchmark for evaluating LLM performance in clinical research claim verification.
Ähnliche Arbeiten
Research electronic data capture (REDCap)—A metadata-driven methodology and workflow process for providing translational research informatics support
2008 · 49.861 Zit.
Gene Ontology: tool for the unification of biology
2000 · 43.871 Zit.
STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets
2018 · 18.785 Zit.
A translation approach to portable ontology specifications
1993 · 12.445 Zit.
Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research
2005 · 11.968 Zit.