OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 11.03.2026, 06:42

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ACCELERATING PATHOLOGY REPORT DIGITIZATION: A MULTI-ENGINE OCR AND LLM FRAMEWORK FOR HEALTHCARE APPLICATIONS

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2026

Jahr

Abstract

Digitization and structuring of pathology reports are essential in modern healthcare for enhancing patient care, data analytics, and medical research. This study presents a framework called Dual-integrated Text Extraction using Hybrid OCR Engines (DiText-OCR), which leverages multiple OCR tools and domain-specific dictionaries to accurately digitize diverse text types, including printed text and low-quality scans. The extracted text is further processed using Large Language Models (LLMs) for named entity recognition, relationship extraction, and data structuring. The resulting structured data are integrated into healthcare databases and systems, enabling applications in clinical decision support, research, and analytics while ensuring interoperability. Despite its effectiveness, the framework faces challenges, such as handling non-standard report formats, maintaining patient privacy, and addressing the current limitations of OCR and LLM technologies in medical contexts. Future research aims to integrate this system with electronic health records, extend its application to other medical documents, and utilize structured data for advanced research and predictive analytics. By addressing these challenges, the proposed framework has the potential to revolutionize medical data management, ultimately improving patient outcomes, enhancing clinical efficiency, and fostering innovation in healthcare.

Ähnliche Arbeiten

Autoren

Themen

Biomedical Text Mining and OntologiesTopic ModelingMachine Learning in Healthcare
Volltext beim Verlag öffnen