Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Large Language Models for Health Care Text Classification: Systematic Review

2025·1 Zitationen·JMIR AIOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Background Large language models (LLMs) have fundamentally transformed approaches to natural language processing tasks across diverse domains. In health care, accurate and cost-efficient text classification is crucial—whether for clinical note analysis, diagnosis coding, or other related tasks—and LLMs present promising potential. Text classification has long faced multiple challenges, including the need for manual annotation during training, the handling of imbalanced data, and the development of scalable approaches. In health care, additional challenges arise, particularly the critical need to preserve patient data privacy and the complexity of medical terminology. Numerous studies have leveraged LLMs for automated health care text classification and compared their performance with traditional machine learning–based methods, which typically require embedding, annotation, and training. However, existing systematic reviews of LLMs either do not specialize in text classification or do not focus specifically on the health care domain. Objective This research synthesizes and critically evaluates the current evidence in the literature on the use of LLMs for text classification in health care settings. Methods Major databases (eg, Google Scholar, Scopus, PubMed, ScienceDirect) and other resources were queried for papers published between 2018 and 2024, following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, resulting in 65 eligible research articles. These studies were categorized by text classification type (eg, binary classification, multilabel classification), application (eg, clinical decision support, public health and opinion analysis), methodology, type of health care text, and the metrics used for evaluation and validation. Results The systematic review includes 65 research articles published between 2020 and Q3 2024, showing a significant increase in publications over time, with 28 papers published in Q1-Q3 2024 alone. Fine-tuning was the most common LLM-based approach (35 papers), followed by prompt engineering (17 papers). BERT (Bidirectional Encoder Representations from Transformers) variants were predominantly used for multilabel classification (50%), whereas closed-source LLMs were most commonly applied to binary (44.0%) and multiclass (30.6%) classification tasks. Clinical decision support was the most frequent application (29 papers). Over 80% of studies used English-language datasets, with clinical notes being the most common text type. All studies employed accuracy-related metrics for evaluation, and the findings consistently showed that LLMs outperformed traditional machine learning approaches in health care text classification tasks. Conclusions This review identifies existing gaps in the literature and highlights future research directions for further investigation.

Autoren

Institutionen

Binghamton University(US)

Themen

Machine Learning in HealthcareText and Document Classification TechnologiesArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Large Language Models for Health Care Text Classification: Systematic Review

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen