OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 11.03.2026, 06:42

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Accelerating Exploratory Clinical Research: An LLM-Powered Framework for Cross-Study Data Harmonization and Natural Language Querying

2026·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2026

Jahr

Abstract

Clinical research depends on high quality data that is standardized, accessible and interoperable. Yet evolving data standards over time and variations in their implementation hinder the secondary use of clinical trial datasets. Although individual studies adhere to clinical data standards set forth by CDISC (Clinical Data Interchange Standards Consortium), differences in study design, interpretation of complex models, controlled terminologies, and historical conventions create inconsistencies that limit interoperability and complicate cross-study analysis. As a result, harmonizing SDTM datasets across studies is essential to enable efficient secondary use and accelerate evidence generation.To address these challenges, we introduce a framework that leverages Large Language Models (LLMs) to automate the harmonization of study-specific clinical trial data, available in CDISC Study Data Tabulation Model (SDTM) format, into data that is harmonized across trials at scale and also enables natural language querying via a text-to-SQL agent. This system transforms siloed study clinical datasets into interoperable, analysis-ready formats while empowering users to retrieve insights across trials without needing SQL or domain-specific schema knowledge. By constructing a semantic layer and applying retrieval-augmented prompting to models like GPT-4o, our approach improves data access, query accuracy, and scalability across use cases. This work demonstrates the potential of LLMs to transform clinical data workflows, cutting manual effort, substantially reducing manual harmonization effort and query latency in secondary analysis workflows, and enabling faster exploratory analysis and hypothesis generation in clinical research.

Ähnliche Arbeiten

Autoren

Themen

Scientific Computing and Data ManagementBiomedical Text Mining and OntologiesElectronic Health Records Systems
Volltext beim Verlag öffnen