Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking AI acceptability and grammaticality in German: A study of ChatGPT and human judgments (complete experimental materials: prompts, instructions and stimulus set)

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

This record contains the complete set of experimental materials used in the study“Benchmarking AI acceptability and grammaticality in German: A study of ChatGPT and human judgments”. The materials are made available to ensure methodological transparency and to enable full replication of the experiment. They document the exact wording of all prompts, instructions and test items used in both the human and AI evaluation conditions. Contents of this record 1. Initial prompt presented to ChatGPT-4: A short readiness/consent prompt used before the task to establish participation and to standardize the response format. This prompt served as the entry point to the AI condition before the full instructions and the stimulus set were provided. 2. Instructions for participants (humans and ChatGPT-4): The complete introductory page of the survey, identical in content for human participants and for the AI condition. The instructions explain the task and the rating procedure for the acceptability judgment study. The participants were asked to evaluate how natural German sentences sound in everyday usage rather than judging them according to prescriptive grammatical rules. The ratings were collected on a five-point Likert scale ranging from 1 (“very unnatural”) to 5 (“very natural”), with intermediate values (2–4) representing graded levels of acceptability. 3. Stimulus set: The dataset includes 60 contextualized German utterances, each embedded in a short discourse context designed to support interpretation while avoiding excessive pragmatic bias. The items are evenly distributed across five experimental conditions (12 items per condition): - G1 – Grey-zone items: constructions that are grammatically possible and attested, but often subject to variable acceptability judgments due to norm conflict, optionality, or stylistic markedness. - G2 – Grammatical controls: uncontroversially grammatical and pragmatically neutral sentences serving as a baseline condition. - G3 – Ungrammatical items: sentences containing categorical morphosyntactic violations (e.g., case, agreement, word order), embedded in coherent contexts to isolate grammatical deviance as the main source of unacceptability. - G4 – Diatopically marked items: regionally anchored lexical or morphosyntactic features close to colloquial standard German, designed to probe sensitivity to geographic variation and standard-language expectations. - G5 – Diastratically marked items: socially and register-marked utterances, including both contextually appropriate and inappropriate uses, targeting the role of sociopragmatic fit in acceptability judgments. Purpose and scope These materials were used in a comparative design involving: (a) native speakers of German; (b) ChatGPT-4 as the AI evaluation condition, with identical task instructions and rating scales in both cases. By providing the exact prompts, instructions and full stimulus inventory, this record allows other researchers to: - reproduce the experimental procedure, - test the materials with different participant populations or with different versions of large language models, - conduct follow-up analyses on acceptability, grammaticality, variation and register sensitivity. The materials are published for replication, methodological inspection and reuse in future research on linguistic judgment tasks involving both human speakers and large language models.

Autoren

Nicholas Catasso

Institutionen

University of Wuppertal(DE)

Themen

Neurobiology of Language and BilingualismArtificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Benchmarking AI acceptability and grammaticality in German: A study of ChatGPT and human judgments (complete experimental materials: prompts, instructions and stimulus set)

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen