Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Accelerating Exploratory Clinical Research: An LLM-Powered Framework for Cross-Study Data Harmonization and Natural Language Querying
0
Zitationen
7
Autoren
2026
Jahr
Abstract
Clinical research depends on high quality data that is standardized, accessible and interoperable. Yet evolving data standards over time and variations in their implementation hinder the secondary use of clinical trial datasets. Although individual studies adhere to clinical data standards set forth by CDISC (Clinical Data Interchange Standards Consortium), differences in study design, interpretation of complex models, controlled terminologies, and historical conventions create inconsistencies that limit interoperability and complicate cross-study analysis. As a result, harmonizing SDTM datasets across studies is essential to enable efficient secondary use and accelerate evidence generation.To address these challenges, we introduce a framework that leverages Large Language Models (LLMs) to automate the harmonization of study-specific clinical trial data, available in CDISC Study Data Tabulation Model (SDTM) format, into data that is harmonized across trials at scale and also enables natural language querying via a text-to-SQL agent. This system transforms siloed study clinical datasets into interoperable, analysis-ready formats while empowering users to retrieve insights across trials without needing SQL or domain-specific schema knowledge. By constructing a semantic layer and applying retrieval-augmented prompting to models like GPT-4o, our approach improves data access, query accuracy, and scalability across use cases. This work demonstrates the potential of LLMs to transform clinical data workflows, cutting manual effort, substantially reducing manual harmonization effort and query latency in secondary analysis workflows, and enabling faster exploratory analysis and hypothesis generation in clinical research.
Ähnliche Arbeiten
UCSF Chimera—A visualization system for exploratory research and analysis
2004 · 46.968 Zit.
SciPy 1.0: fundamental algorithms for scientific computing in Python
2020 · 35.496 Zit.
Clustal W and Clustal X version 2.0
2007 · 28.839 Zit.
The REDCap consortium: Building an international community of software platform partners
2019 · 22.622 Zit.
Array programming with NumPy
2020 · 20.576 Zit.