Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A GPT-2 Language Model for Biomedical Texts in Portuguese

2021·47 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2021

Jahr

Abstract

Electronic health records (EHRs) contain patient-related information formed by structured and unstructured data, a valuable data source for Natural Language Processing (NLP) in the healthcare domain. The contextual word embeddings and Transformer-based models have proved their potential, reaching state-of-the-art for various NLP tasks. Although the performance for downstream NLP tasks with free-texts written in English has recently improved, less resource is available considering clinical texts and low-resource languages such as Portuguese. Our objective is to develop a Generative Pre-trained Transformer 2 (GPT-2) language model for Portuguese to support clinical and biomedical NLP tasks. We fine-tuned a generic Portuguese GPT-2 model to corpora of biomedical texts written in Portuguese, using transfer learning. We experimented on a public dataset, manually annotated for detecting patient fall, i.e., a classification task. Our in-domain GPT-2 model outperformed the generic Portuguese GPT-2 model by 3.43 in F1-score (weighted). Our preliminary results show that transfer learning with domain literature can benefit Portuguese biomedical NLP tasks, aligned with other languages' results.

Autoren

Institutionen

Pontifícia Universidade Católica do Paraná(BR)

Themen

Topic ModelingMachine Learning in HealthcareNatural Language Processing Techniques

Volltext beim Verlag öffnen

A GPT-2 Language Model for Biomedical Texts in Portuguese

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen