Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Data Plus Theory Equals Codebook: Leveraging LLMs for Human-AI Code-book Development

2026·0 Zitationen·Zenodo (CERN European Organization for Nuclear Research)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Recent research has explored the use of Large Language Models (LLMs) to develop qualitative codebooks, mainly for inductive work with large datasets, where manual review is impractical. Although these efforts show promise, they often neglect the theoretical grounding essential to many types of qualitative analysis. This paper investigates the potential of GPT-4o to support theory-informed codebook development across two educational contexts. In the first study, we employ a three-step approach—drawing on Winne & Hadwin's and Zimmerman's Self-Regulated Learning (SRL) theories, think-aloud data, and human refinement—to evaluate GPT-4o's ability to generate high-quality, theory-aligned codebooks. Results indicate that GPT-4o can effectively leverage its knowledge base to identify SRL constructs reflected in student problem-solving behavior. In the second study, we extend this approach to a STEM game-based learning context guided by Hidi & Renninger's four-phase model of Interest Development. We compare four prompting strategies: no theories provided, theories named, full references given, and full-text theory papers supplied. Human evaluations show that naming the theory without including full references produced the most practical and usable codebook, while supplying full papers to the prompt enhanced theoretical alignment but reduced applicability. These findings suggest that GPT-4o can be a valuable partner in theory-driven qualitative research when grounded in well-established frameworks, but that attention to prompt design is required. Our results show that widely available foundation models—trained on large-scale open web and licensed datasets—can effectively distill established educational theories to support qualitative research and codebook development. The code for our codebook development process and all the employed prompts and codebooks produced by GPT are available for replication purposes at: https://osf.io/g3z4x.

Autoren

Institutionen

Themen

Computational and Text Analysis MethodsQualitative Research Methods and ApplicationsArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Data Plus Theory Equals Codebook: Leveraging LLMs for Human-AI Code-book Development

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen