Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
HALO-GPT:Hindi Active Learning with Oracle GPT-3.5
0
Zitationen
3
Autoren
2025
Jahr
Abstract
Obtaining high-quality annotations for low-resource (LR) languages remains a significant bottleneck for training supervised deep learning models, including for Named Entity Recognition (NER) tasks. This work investigates the use of GPT-3.5 as an oracle annotator within an active learning (AL) framework to minimize human effort while preserving annotation quality. We evaluate its performance across six diverse Hindi NER datasets, spanning general, medical, and code-mixed domains, using uncertainty-based sampling strategies to iteratively select the most informative sentences for labeling. Our experiments reveal that while GPT-3.5 struggles with domain-specific and low-frequency entities, it maintains strong performance for common entity types. Despite a decrease in average scores on challenging datasets, per-entity performance for Person, Organization, and Location remained competitive, achieving maximum F1-scores of 0.82, 0.78, and 0.81, respectively. To the best of our knowledge, this is the first large-scale study demonstrating the use of GPT-3.5 as an active annotator for low-resource NER tasks in Indian language.