Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Preprocessing of unstructured medical data: the impact of each preprocessing stage on classification
25
Zitationen
3
Autoren
2020
Jahr
Abstract
Nowadays, it is still important to develop methods for processing data, in particular medical texts, in Russian. In this paper, we checked how each stage of text pre-processing affects the result of the classifier. The paper analyzed 269923 records of allergic anamnesis of patients, 11670 of which were placed for further processing. We consider the main stages of pre-processing: tokenization, deletion of stop words, error correction, document cropping, normalization, class harmonization, and vectorization. To vectorize the data, we have selected the Bag-of-Words. The method of logistic regression was chosen for classification, since it has easy reproducibility and interpretation. Precision, recall and F-measure were selected as evaluation metrics. The results (F = 88.12%) showed that the most effective was the stage of normalization and error correction.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.732 Zit.
Coding Algorithms for Defining Comorbidities in ICD-9-CM and ICD-10 Administrative Data
2005 · 10.547 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.949 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.550 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.061 Zit.