Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Are Large Language Models Reliable across Services and over (a Short) Time? An Exploratory Study in Sociology with Pedagogical Implications
0
Zitationen
2
Autoren
2026
Jahr
Abstract
Amid educational debates about generative artificial intelligence (GenAI), little research focuses on large language models’ (LLMs) reliability, which has important implications regardless of whether students are permitted to use LLMs in sociology classrooms. In this exploratory study, we focus on the intersection of GenAI and teaching and learning in sociology, asking: To what extent are LLM services, including ChatGPT, DeepSeek, and Gemini, reliable (a) with one another and (b) over (a short) time? We administered a sociology quiz with 20 multiple-choice questions of varying difficulty—and covering different topics, some of which are sensitive—to each of these LLM services over the course of two seven-day intervals: April 21 through 27, 2025, and June 13 through 19, 2025. The results indicate very high levels of reliability between LLM services and over these time intervals, but some unreliability mainly on two questions that were sensitive, required higher-level thinking, or both. Pedagogical and other implications are discussed.
Ähnliche Arbeiten
2019 · 31.762 Zit.
Techniques to Identify Themes
2003 · 5.393 Zit.
Answering the Call for a Standard Reliability Measure for Coding Data
2007 · 4.086 Zit.
Basic Content Analysis
1990 · 4.045 Zit.
Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts
2013 · 3.079 Zit.