Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
AI-Generated Text Detection in Low-Resource Languages: A Case Study on Urdu
0
Zitationen
3
Autoren
2025
Jahr
Abstract
Large Language Models (LLMs) are now capable of generating text that closely resembles human writing, making them powerful tools for content creation, but this growing ability has also made it harder to tell whether a piece of text was written by a human or by a machine. This challenge becomes even more serious for languages like Urdu, where there are very few tools available to detect AI-generated text. To address this gap, we propose a novel AI-generated text detection framework tailored for the Urdu language. A balanced dataset comprising 1,800 humans authored, and 1,800 AI generated texts, sourced from models such as Gemini, GPT-4o-mini, and Kimi AI was developed. Detailed linguistic and statistical analysis was conducted, focusing on features such as character and word counts, vocabulary richness (Type Token Ratio), and N-gram patterns, with significance evaluated through t-tests and MannWhitney U tests. Three state-of-the-art multilingual transformer models such as mdeberta-v3-base, distilbert-base-multilingualcased, and xlm-roberta-base were fine-tuned on this dataset. The mDeBERTa-v3-base achieved the highest performance, with an F1-score 91.29 and accuracy of 91.26% on the test set. This research advances efforts in contesting misinformation and academic misconduct in Urdu-speaking communities and contributes to the broader development of NLP tools for low resource languages.
Ähnliche Arbeiten
The spread of true and false news online
2018 · 8.065 Zit.
What is Twitter, a social network or a news media?
2010 · 6.664 Zit.
Social Media and Fake News in the 2016 Election
2017 · 6.419 Zit.
Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception
1983 · 6.261 Zit.
The Matthew Effect in Science
1968 · 6.161 Zit.