OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 02.04.2026, 07:04

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Development and Validation of a Generative Artificial Intelligence-Based Pipeline for Automated Clinical Data Extraction from Electronic Health Records: Technical Implementation Study (Preprint)

2024·0 ZitationenOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2024

Jahr

Abstract

<sec> <title>UNSTRUCTURED</title> Background Manual abstraction of unstructured clinical data is often necessary for granular clinical outcomes research, but is time consuming and can be of variable quality. Large language models (LLMs) show promise in medical data extraction, yet integrating them into research workflows remains challenging and poorly described. We developed and integrated an LLM-based system for automated data extraction from unstructured electronic health record (EHR) text reports within an established clinical outcomes database. Methods We implemented a generative AI pipeline (UODBLLM) utilizing a flexible language model interface that supports various LLM implementations, including HIPAA-compliant cloud services and local open-source models. We used XML-structured prompts and integrated using an open database connectivity interface to generate structured data from clinical documentation in the EHR. We evaluated UODBLLM's performance on completion rate, processing time, and extraction capabilities across multiple clinical data elements, including quantitative measurements, categorical assessments, and anatomical descriptions. System reliability was tested across multiple batches to assess scalability and consistency. Results Piloted against MRI reports, UODBLLM processed 1,800 clinical documents with a 100% completion rate and an average processing time of 8.90 seconds per report. Token utilization averaged 2,692 tokens per report, with an input-to-output ratio of approximately 6.5:1, resulting in a processing cost of $0.009 per report. UODBLLM had consistent performance across 18 batches of 100 reports each and completed all processing in 4.45 hours. From each report, UODBLLM extracted 16 structured clinical elements, including prostate volume, PSA values, PI-RADS scores, clinical staging, and anatomical assessments. All extracted data was automatically validated against predefined schemas and stored in standardized JSON format. Conclusion We demonstrated successful integration of an LLM-based extraction system within an existing clinical outcomes database, achieving rapid, comprehensive data extraction at minimal cost. UODBLLM provides a scalable, efficient solution for automating clinical data extraction while maintaining PHI security. This approach could significantly accelerate research timelines and expand feasible clinical studies, particularly for large-scale database projects. </sec>

Ähnliche Arbeiten

Autoren

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingMachine Learning in Healthcare
Volltext beim Verlag öffnen