Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in\n Medical Abstracts

2017·73 Zitationen·arXiv (Cornell University)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2017

Jahr

Abstract

We present PubMed 200k RCT, a new dataset based on PubMed for sequential\nsentence classification. The dataset consists of approximately 200,000\nabstracts of randomized controlled trials, totaling 2.3 million sentences. Each\nsentence of each abstract is labeled with their role in the abstract using one\nof the following classes: background, objective, method, result, or conclusion.\nThe purpose of releasing this dataset is twofold. First, the majority of\ndatasets for sequential short-text classification (i.e., classification of\nshort texts that appear in sequences) are small: we hope that releasing a new\nlarge dataset will help develop more accurate algorithms for this task. Second,\nfrom an application perspective, researchers need better tools to efficiently\nskim through the literature. Automatically classifying each sentence in an\nabstract would help researchers read abstracts more efficiently, especially in\nfields where abstracts may be long, such as the medical field.\n

Autoren

Themen

Artificial Intelligence in Healthcare and EducationMeta-analysis and systematic reviewsMachine Learning in Healthcare

Volltext beim Verlag öffnen

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in\n Medical Abstracts

Abstract

Ähnliche Arbeiten

Autoren

Themen