Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Contrastive siamese network for detecting AI-generated text across domains and models
1
Zitationen
4
Autoren
2025
Jahr
Abstract
The rapid proliferation of large language models (LLMs) has raised growing concerns about distinguishing between human-written and AI-generated text. This work addresses the task of detecting AI-generated content by evaluating the latent similarity between a given input text and an alternative response generated for the same prompt, either known or inferred. Accordingly, CLAID (Contrastive Learning for AI Detection) is proposed as a Siamese Neural Network architecture utilizing BERT encoders and contrastive loss to capture semantic similarity between text pairs. Unlike prior approaches that rely on explicit classification or domain-specific features, our method focuses on modeling pairwise similarity, enabling a flexible and model-agnostic detection framework. To evaluate the generalization capabilities of the system, a comprehensive multi-domain and multi-model benchmark comprising three diverse datasets (i.e., HC3, DAIGT, and OUTFOX), encompassing a wide range of text genres, prompt structures, and generative models, has been constructed. Experimental results show that the proposed model achieves near-perfect classification accuracy across both single-domain and mixed-domain scenarios, demonstrating strong robustness to domain shifts, prompt variability, and authorship ambiguity. The model also exhibits strong data efficiency, attaining high performance with minimal supervision.
Ähnliche Arbeiten
Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination
2004 · 4.357 Zit.
Manual for Raven's progressive matrices and vocabulary scales
1998 · 4.215 Zit.
The mathematics of statistical machine translation: parameter estimation
1993 · 4.118 Zit.
Word association norms, mutual information, and lexicography
1990 · 3.665 Zit.
Language identification in the limit
1967 · 3.572 Zit.