Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Challenges and Choices when Evaluating Alignment inHuman-AI Systems
0
Zitationen
2
Autoren
2025
Jahr
Abstract
Aligning AI to human values is a current research endeavor where much focus goes to training AI systems to align with values, goals and tasks. But evaluating whether those aligned systems are actually better and more trusted by human users is an essential part of improving such systems. We present three challenges encountered in the evaluation of aligned AI systems. We present possible solutions to these challenges, discuss our own and alternative design choices, and outline next steps for AI alignment research to flourish.
Ähnliche Arbeiten
The global landscape of AI ethics guidelines
2019 · 4.514 Zit.
The Limitations of Deep Learning in Adversarial Settings
2016 · 3.859 Zit.
Trust in Automation: Designing for Appropriate Reliance
2004 · 3.386 Zit.
Fairness through awareness
2012 · 3.269 Zit.
Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer
1987 · 3.183 Zit.