Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Detecting Human vs AI-Generated Text in Urdu: A Comparative Study of Deep Learning and Transformer Models

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

The emergence of large language models (LLMs) including GPT-4 has blurred the line between human and machine-generated writing, raising concerns about incorrect information, academic integrity, and content authenticity. This challenge is specifically crucial for Urdu, a low-resource language with limited datasets and NLP tools. To deal with this gap, we employed the Urdu Human and AI text (UHAT) dataset and performed a comparative study of deep learning and transformer-based models for binary classification. Conventional sequence models, such as RNN, LSTM, and GRU with Word2Vec and FastText embeddings, had been benchmarked in opposition to pretrained transformers including BERT, mBERT, UrduBERT, and DistilBERT etc. Experimental outcomes show that transformer models significantly outperform recurrent architectures. Among them, multilingual BERT (mBERT) achieved the best performance with 91.67% accuracy and F1-score, surpassing UrduBERT and previous benchmarks like XLM-RoBERTa on the HLU corpus. These findings set up strong baseline results for Urdu AI-text detection and underscore the potential of multilingual transformers in low-resource settings.

Autoren

Institutionen

University of Sialkot

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingComputational and Text Analysis Methods

Volltext beim Verlag öffnen

Detecting Human vs AI-Generated Text in Urdu: A Comparative Study of Deep Learning and Transformer Models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen