OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 02.04.2026, 01:34

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Artificial Intelligence for Automated, Highly Accurate, and Scalable Multimodal EHR Data Abstraction

2026·0 Zitationen·medRxivOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2026

Jahr

Abstract

Abstract Electronic health records (EHRs) contain rich multimodal data but remain underutilized for populating clinical registries due to the time and cost of manual abstraction. We developed an AI-driven pipeline to automate data abstraction for variables in the Society of Thoracic Surgeons Adult Cardiac Surgery Database (ACSD). Models were developed using Mass General Brigham data and externally validated on Hartford HealthCare data. The pipeline processes ten clinical EHR sources, seven unstructured text types and three structured data types; each encoded using two language-model embeddings and term frequency–inverse document frequency. This approach yielded 30 source-specific models per target variable whose predictions were aggregated by an ensemble meta-learner, followed by a dual-threshold confidence framework that enforced registry-grade high accuracy standards and deferred uncertain predictions to human review. The developed pipeline achieved an overall accuracy exceeding 99% across 647 registry variables, while automatically completing 49.5% and 43.2% of variables at both sites, respectively. These results demonstrate that AI-assisted abstraction can substantially reduce clinical registry data collection burden while maintaining high accuracy.

Ähnliche Arbeiten