Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Making AI More Trustworthy and Morally Aligned by Integrating Human Cognition
0
Zitationen
5
Autoren
2025
Jahr
Abstract
People often mistrust the moral decisions of AI, in part because it uses opaque black-box processes that differ from human reasoning. We introduce a method—“cognitive” bottlenecks—for more trustworthy and transparent AI by aligning large language models (LLMs) with human moral cognition. Bottlenecks selectively focus AI categorization decisions on a small set of key features, and human moral judgments often similarly center on a small set of key psychological features, like perceived harm, agent intention, and victim vulnerability. We implement and test “cognitively aligned” bottleneck models across multiple LLMs and moral frameworks. Compared with standard end-to-end models, people rate bottleneck models as more transparent and trustworthy. Analyses show that narrowing LLMs’ “focus” to a few key features improves their ability to capture human moral judgments. Implementing cognitively aligned bottlenecks is simple, requiring no additional training or data. This work demonstrates the benefits of integrating psychological theory into AI and offers a scalable path to more morally aligned AI.
Ähnliche Arbeiten
The global landscape of AI ethics guidelines
2019 · 4.588 Zit.
The Limitations of Deep Learning in Adversarial Settings
2016 · 3.869 Zit.
Trust in Automation: Designing for Appropriate Reliance
2004 · 3.418 Zit.
Fairness through awareness
2012 · 3.280 Zit.
Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer
1987 · 3.183 Zit.