Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

HALO-GPT:Hindi Active Learning with Oracle GPT-3.5

2025·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Obtaining high-quality annotations for low-resource (LR) languages remains a significant bottleneck for training supervised deep learning models, including for Named Entity Recognition (NER) tasks. This work investigates the use of GPT-3.5 as an oracle annotator within an active learning (AL) framework to minimize human effort while preserving annotation quality. We evaluate its performance across six diverse Hindi NER datasets, spanning general, medical, and code-mixed domains, using uncertainty-based sampling strategies to iteratively select the most informative sentences for labeling. Our experiments reveal that while GPT-3.5 struggles with domain-specific and low-frequency entities, it maintains strong performance for common entity types. Despite a decrease in average scores on challenging datasets, per-entity performance for Person, Organization, and Location remained competitive, achieving maximum F1-scores of 0.82, 0.78, and 0.81, respectively. To the best of our knowledge, this is the first large-scale study demonstrating the use of GPT-3.5 as an active annotator for low-resource NER tasks in Indian language.

Autoren

Institutionen

Indian Institute of Technology Guwahati(IN)

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationMachine Learning and Algorithms

Volltext beim Verlag öffnen

HALO-GPT:Hindi Active Learning with Oracle GPT-3.5

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen