Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Data Plus Theory Equals Codebook: Leveraging LLMs for Human-AI Code-book Development
0
Zitationen
11
Autoren
2026
Jahr
Abstract
Recent research has explored the use of Large Language Models (LLMs) to develop qualitative codebooks, mainly for inductive work with large datasets, where manual review is impractical. Although these efforts show promise, they often neglect the theoretical grounding essential to many types of qualitative analysis. This paper investigates the potential of GPT-4o to support theory-informed codebook development across two educational contexts. In the first study, we employ a three-step approach—drawing on Winne & Hadwin's and Zimmerman's Self-Regulated Learning (SRL) theories, think-aloud data, and human refinement—to evaluate GPT-4o's ability to generate high-quality, theory-aligned codebooks. Results indicate that GPT-4o can effectively leverage its knowledge base to identify SRL constructs reflected in student problem-solving behavior. In the second study, we extend this approach to a STEM game-based learning context guided by Hidi & Renninger's four-phase model of Interest Development. We compare four prompting strategies: no theories provided, theories named, full references given, and full-text theory papers supplied. Human evaluations show that naming the theory without including full references produced the most practical and usable codebook, while supplying full papers to the prompt enhanced theoretical alignment but reduced applicability. These findings suggest that GPT-4o can be a valuable partner in theory-driven qualitative research when grounded in well-established frameworks, but that attention to prompt design is required. Our results show that widely available foundation models—trained on large-scale open web and licensed datasets—can effectively distill established educational theories to support qualitative research and codebook development. The code for our codebook development process and all the employed prompts and codebooks produced by GPT are available for replication purposes at: https://osf.io/g3z4x.
Ähnliche Arbeiten
2019 · 31.551 Zit.
Techniques to Identify Themes
2003 · 5.374 Zit.
Answering the Call for a Standard Reliability Measure for Coding Data
2007 · 4.063 Zit.
Basic Content Analysis
1990 · 4.045 Zit.
Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts
2013 · 3.049 Zit.