Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Making AI More Trustworthy and Morally Aligned by Integrating Human Cognition

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

People often mistrust the moral decisions of AI, in part because it uses opaque black-box processes that differ from human reasoning. We introduce a method—“cognitive” bottlenecks—for more trustworthy and transparent AI by aligning large language models (LLMs) with human moral cognition. Bottlenecks selectively focus AI categorization decisions on a small set of key features, and human moral judgments often similarly center on a small set of key psychological features, like perceived harm, agent intention, and victim vulnerability. We implement and test “cognitively aligned” bottleneck models across multiple LLMs and moral frameworks. Compared with standard end-to-end models, people rate bottleneck models as more transparent and trustworthy. Analyses show that narrowing LLMs’ “focus” to a few key features improves their ability to capture human moral judgments. Implementing cognitively aligned bottlenecks is simple, requiring no additional training or data. This work demonstrates the benefits of integrating psychological theory into AI and offers a scalable path to more morally aligned AI.

Autoren

Themen

Ethics and Social Impacts of AIPsychology of Moral and Emotional JudgmentArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Making AI More Trustworthy and Morally Aligned by Integrating Human Cognition

Abstract

Ähnliche Arbeiten

Autoren

Themen