Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Generative artificial intelligence for automated data extraction from unstructured medical text

2025·5 Zitationen·JAMIA OpenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Objectives: Unstructured data, such as procedure notes, contain valuable medical information that is frequently underutilized due to the labor-intensive nature of data extraction. This study aims to develop a generative artificial intelligence (GenAI) pipeline using an open-source Large Language Model (LLM) with built-in guardrails and a retry mechanism to extract data from unstructured right heart catheterization (RHC) notes while minimizing errors, including hallucinations. Materials and Methods: A total of 220 RHC notes were randomly selected for pipeline development and 200 for validation from the Pulmonary Vascular Disease Registry. The pipeline comprised three main components: the Engineered Preload Framework (EPF), which integrated schemas and instructions; the LLM module, enhanced by reasoning capabilities; and the validation and retry mechanism, which ensured data accuracy through iterative self-correction. A clinical expert manually extracted data from the validation cohort to establish the ground truth. Pipeline performance was evaluated using precision, recall, and F1 score. Additionally, the dataset was stratified into quartiles to assess the pipeline's ability to handle varying levels of data availability. Results: The pipeline achieved 99.0% precision, 85.0% recall, and a 91.5% F1 score, with an overall accuracy of 90% when evaluated at the note level. The most common error was missed values (5.2%), while hallucinations were the least frequent (<0.01%). Discussion and Conclusion: This study demonstrates the feasibility of a robust GenAI pipeline for automating structured data extraction from unstructured RHC procedure notes. The approach highlights the potential of LLMs in medical data mining, improving research efficiency and clinical applications.

Autoren

Institutionen

Brigham and Women's Hospital(US)

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareTopic Modeling

Volltext beim Verlag öffnen

Generative artificial intelligence for automated data extraction from unstructured medical text

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen