Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
From registers to research: Data quality challenges in Finnish EHR – Enabling AI-driven research through data preparation and standardization
0
Zitationen
8
Autoren
2026
Jahr
Abstract
Finland’s nationwide health registers and universal healthcare system provide comprehensive, longitudinal electronic health record data with strong potential for artificial intelligence-driven clinical research. Despite their coverage and richness, raw electronic health record datasets are not directly usable for advanced analytics due to fragmentation, heterogeneity, missingness, and inconsistencies in documentation and coding. This study describes data quality challenges encountered while preparing a breast cancer cohort (diagnosed 2012–2022) from the Wellbeing Services County of North Ostrobothnia as a preparatory step for artificial intelligence-based survival modelling. The initial dataset included 8074 patients across multiple domain-specific data files capturing diagnoses, laboratory results, pathology reports, medications, and procedural information. Following predefined cohort restrictions and preprocessing steps, 1967 patients remained for analysis. Identified challenges were grouped into four categories: 1. human-generated errors 2. decentralization-generated issues 3. time or system life-cycle–generated inconsistencies and 4. data governance/coding issues. The study adopts a qualitative research design aimed at systematically characterizing data quality challenges relevant to artificial intelligence application, and no predictive model was developed at this stage. The findings demonstrate that substantial preprocessing and data loss are often unavoidable in real-world electronic health record research and emphasize the need for standardized documentation, harmonized data structures, and closer collaboration between healthcare professionals, system developers, and data scientists to fully realize the potential of register-based electronic health record data in reliable artificial intelligence-driven research.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.383 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.719 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.257 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.685 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.434 Zit.