OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 24.05.2026, 02:27

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A GPT-2 Language Model for Biomedical Texts in Portuguese

2021·47 Zitationen
Volltext beim Verlag öffnen

47

Zitationen

5

Autoren

2021

Jahr

Abstract

Electronic health records (EHRs) contain patient-related information formed by structured and unstructured data, a valuable data source for Natural Language Processing (NLP) in the healthcare domain. The contextual word embeddings and Transformer-based models have proved their potential, reaching state-of-the-art for various NLP tasks. Although the performance for downstream NLP tasks with free-texts written in English has recently improved, less resource is available considering clinical texts and low-resource languages such as Portuguese. Our objective is to develop a Generative Pre-trained Transformer 2 (GPT-2) language model for Portuguese to support clinical and biomedical NLP tasks. We fine-tuned a generic Portuguese GPT-2 model to corpora of biomedical texts written in Portuguese, using transfer learning. We experimented on a public dataset, manually annotated for detecting patient fall, i.e., a classification task. Our in-domain GPT-2 model outperformed the generic Portuguese GPT-2 model by 3.43 in F1-score (weighted). Our preliminary results show that transfer learning with domain literature can benefit Portuguese biomedical NLP tasks, aligned with other languages' results.

Ähnliche Arbeiten