Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of Multiple Pretrained BERT Models to Automate and Accelerate Data Annotation for Large Datasets
33
Zitationen
6
Autoren
2022
Jahr
Abstract
Purpose: To develop and evaluate domain-specific and pretrained bidirectional encoder representations from transformers (BERT) models in a transfer learning task on varying training dataset sizes to annotate a larger overall dataset. Materials and Methods: The authors retrospectively reviewed 69 095 anonymized adult chest radiograph reports (reports dated April 2020-March 2021). From the overall cohort, 1004 reports were randomly selected and labeled for the presence or absence of each of the following devices: endotracheal tube (ETT), enterogastric tube (NGT, or Dobhoff tube), central venous catheter (CVC), and Swan-Ganz catheter (SGC). Pretrained transformer models (BERT, PubMedBERT, DistilBERT, RoBERTa, and DeBERTa) were trained, validated, and tested on 60%, 20%, and 20%, respectively, of these reports through fivefold cross-validation. Additional training involved varying dataset sizes with 5%, 10%, 15%, 20%, and 40% of the 1004 reports. The best-performing epochs were used to assess area under the receiver operating characteristic curve (AUC) and determine run time on the overall dataset. Results: The highest average AUCs from fivefold cross-validation were 0.996 for ETT (RoBERTa), 0.994 for NGT (RoBERTa), 0.991 for CVC (PubMedBERT), and 0.98 for SGC (PubMedBERT). DeBERTa demonstrated the highest AUC for each support device trained on 5% of the training set. PubMedBERT showed a higher AUC with a decreasing training set size compared with BERT. Training and validation time was shortest for DistilBERT at 3 minutes 39 seconds on the annotated cohort. Conclusion: . ©RSNA, 2022See also the commentary by Zech in this issue.
Ähnliche Arbeiten
La certeza de lo impredecible: Cultura Educación y Sociedad en tiempos de COVID19
2020 · 19.307 Zit.
A Multi-Modal Distributed Real-Time IoT System for Urban Traffic Control (Invited Paper)
2024 · 14.302 Zit.
UNet++: A Nested U-Net Architecture for Medical Image Segmentation
2018 · 8.794 Zit.
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
2021 · 7.406 Zit.
scikit-image: image processing in Python
2014 · 6.840 Zit.