Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Leveraging large language models for knowledge-free weak supervision in clinical natural language processing

2025·6 Zitationen·Scientific ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

The performance of deep learning-based natural language processing systems is based on large amounts of labeled training data which, in the clinical domain, are not easily available or affordable. Weak supervision and in-context learning offer partial solutions to this issue, particularly using large language models (LLMs), but their performance still trails traditional supervised methods with moderate amounts of gold-standard data. In particular, inferencing with LLMs is computationally heavy. We propose an approach leveraging fine-tuning LLMs and weak supervision with virtually no domain knowledge that still achieves consistently dominant performance. Using a prompt-based approach, the LLM is used to generate weakly-labeled data for training a downstream BERT model. The weakly supervised model is then further fine-tuned on small amounts of gold standard data. We evaluate this approach using Llama2 on three different i2b2/ n2c2 datasets for clinical named entity recognition. With no more than 10 gold standard notes, our final BERT models weakly supervised by fine-tuned Llama2-13B consistently outperformed out-of-the-box PubMedBERT by 4.7-47.9% in F1 scores. With only 50 gold standard notes, our models achieved close performance to fully fine-tuned systems.

Autoren

Institutionen

Themen

Machine Learning in HealthcareTopic ModelingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Leveraging large language models for knowledge-free weak supervision in clinical natural language processing

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen