Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Ein externer Link zum Volltext ist derzeit nicht verfügbar.

Eliciting Trustworthiness Priors of Large Language Models via Economic Games

2026·0 Zitationen·Open MINDOpen Access

Zitationen

Autoren

2026

Jahr

Abstract

One critical aspect of building human-centered, trustworthy artificial intelligence (AI) systems is maintaining calibrated trust: appropriate reliance on AI systems outperforms both overtrust (e.g., automation bias) and undertrust (e.g., disuse). A fundamental challenge, however, is how to characterize the level of trust exhibited by an AI system itself. Here, we propose a novel elicitation method based on iterated in-context learning (Zhu and Griffiths, 2024a) and apply it to elicit trustworthiness priors using the Trust Game from behavioral game theory. The Trust Game is particularly well suited for this purpose because it operationalizes trust as voluntary exposure to risk based on beliefs about another agent, rather than self-reported attitudes. Using our method, we elicit trustworthiness priors from several leading large language models (LLMs) and find that GPT-4.1's trustworthiness priors closely track those observed in humans. Building on this result, we further examine how GPT-4.1 responds to different player personas in the Trust Game, providing an initial characterization of how such models differentiate trust across agent characteristics. Finally, we show that variation in elicited trustworthiness can be well predicted by a stereotype-based model grounded in perceived warmth and competence.

Autoren

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsExplainable Artificial Intelligence (XAI)

Eliciting Trustworthiness Priors of Large Language Models via Economic Games

Abstract

Ähnliche Arbeiten

Autoren

Themen