Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management
0
Zitationen
69
Autoren
2025
Jahr
Abstract
This second update to the 2025 International AI Safety Report assesses new developments in general-purpose AI risk management over the past year. It examines how researchers, public institutions, and AI developers are approaching risk management for general-purpose AI. In recent months, for example, three leading AI developers applied enhanced safeguards to their new models, as their internal pre-deployment testing could not rule out the possibility that these models could be misused to help create biological weapons. Beyond specific precautionary measures, there have been a range of other advances in techniques for making AI models and systems more reliable and resistant to misuse. These include new approaches in adversarial training, data curation, and monitoring systems. In parallel, institutional frameworks that operationalise and formalise these technical capabilities are starting to emerge: the number of companies publishing Frontier AI Safety Frameworks more than doubled in 2025, and governments and international organisations have established a small number of governance frameworks for general-purpose AI, focusing largely on transparency and risk assessment.
Ähnliche Arbeiten
The global landscape of AI ethics guidelines
2019 · 4.482 Zit.
The Limitations of Deep Learning in Adversarial Settings
2016 · 3.853 Zit.
Trust in Automation: Designing for Appropriate Reliance
2004 · 3.362 Zit.
Fairness through awareness
2012 · 3.258 Zit.
Mind over Machine: The Power of Human Intuition and Expertise in the Era of the Computer
1987 · 3.182 Zit.
Autoren
- Yoshua Bengio
- Stephen Clare
- Carina Prunkl
- Maksym Andriushchenko
- Ben Bucknall
- Peter Fox
- Nestor Maslej
- Conor McGlynn
- Melanie Murray
- Shalaleh Rismani
- Stephen Casper
- Jessica Newman
- Daniel Privitera
- Sören Mindermann
- Daron Acemoğlu
- Thomas G. Dietterich
- F. Heintz
- Geoffrey E. Hinton
- Nicholas R. Jennings
- Susan Leavy
- Teresa Ludermir
- Vidushi Marda
- Helen Margetts
- John McDermid
- Jane Munga
- Arvind Narayanan
- Alondra Nelson
- Clara Neppel
- Sarvapali D. Ramchurn
- Stuart Russell
- Marietje Schaake
- Bernhard Schölkopf
- A. Soto
- Lee Tiedrich
- Gaël Varoquaux
- Andrew I. Yao
- Ya-Qin Zhang
- Leandro Aguirre
- Oluremi N Ajala
- Fahad Albalawi
- Noora AlMalek
- Christoph Busch
- André Carvalho
- Jonathan Collas
- Amandeep S. Gill
- Ahmet Hatip
- J. K. Heikkilä
- Chris J. Johnson
- Gill Jolly
- Ziv Katzir
- Mary Kerema
- Hiroaki Kitano
- Adéle Kruger
- Aoife McLysaght
- Oleksii Molchanovskyi
- Alessio Monti
- Kyoung Mu Lee
- Mona Nemer
- Nuria Oliver
- R. Pezoa
- Audrey Plonk
- José Portillo
- Balaraman Ravindran
- Hammam Riza
- Crystal Rugege
- Haroon Sheikh
- David Wong
- Yi Zeng
- Liming Zhu