Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Machine Learning-Based Detection of AI-Generated Text via Stylistic and Statistical Feature Modeling

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Through the advances of large-language models (LLMs) AI- generated text can be created with ease. But, these tools can also pose a threat, e.g. through the creation of disinformation. In this work, we analysed texts generated by three LLMs: GPT-3.5, LLaMA3, and Qwen from the CUDRT dataset. We extracted 220 stylistic and statistical features of human and AI-generated text using the LFTK library. First, we analysed the features using the pearson correlation. Second, we trained five machine learning models and tested the classifiers on detecting completely AI-generated, polished, rewritten texts, and summaries created by AI. We calculated an F1-score of 90%+ for the text generated entirely by AI, depending on the LLM used. We found that AI-generated texts, independent of LLM, can be identified through a high kuperman age, i.e. high word complexity, whereby human-written texts are written with higher lexical variation and richness. We provide an explanation for the classification results and a comparison with RoBERTa (fine-tuned).

Autoren

Institutionen

Fraunhofer Institute for Secure Information Technology(DE)

Themen

Artificial Intelligence in Healthcare and EducationMisinformation and Its ImpactsComputational and Text Analysis Methods

Volltext beim Verlag öffnen

Machine Learning-Based Detection of AI-Generated Text via Stylistic and Statistical Feature Modeling

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen