Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation
0
Zitationen
37
Autoren
2026
Jahr
Abstract
Artificial Intelligence (AI) benchmarks play a central role in measuring progress in model development and guiding deployment decisions. However, many benchmarks quickly become saturated, meaning that they can no longer differentiate between the best-performing models, diminishing their long-term value. In this study, we analyze benchmark saturation across 60 Large Language Model (LLM) benchmarks selected from technical reports by major model developers. To identify factors driving saturation, we characterize benchmarks along 14 properties spanning task design, data construction, and evaluation format. We test five hypotheses examining how each property contributes to saturation rates. Our analysis reveals that nearly half of the benchmarks exhibit saturation, with rates increasing as benchmarks age. Notably, hiding test data (i.e., public vs. private) shows no protective effect, while expert-curated benchmarks resist saturation better than crowdsourced ones. Our findings highlight which design choices extend benchmark longevity and inform strategies for more durable evaluation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.339 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.211 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.614 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.478 Zit.
Autoren
- Mubashara Akhtar
- Anka Reuel
- Prajna Soni
- Sumati Ahuja
- Pawan Sasanka Ammanamanchi
- Ruchit Rawal
- Vilém Zouhar
- Srishti Yadav
- Chenxi Whitehouse
- Dayeon Ki
- Jennifer Mickel
- Liat Ein‐Dor
- Marek Suppa
- Jan Batzner
- Jenny Chim
- Jeba Sania
- Yanan Long
- Hossein A. Rahmani
- Cathryn Knight
- Yiyang Nan
- Jyoutir Raj
- Yu Fan
- Shubham Singh
- Subramanyam Sahoo
- Eliya Habba
- Usman Gohar
- Siddhesh Pawar
- Robert Scholz
- Arjun Subramonian
- Jingwei Ni
- Mykel Kochenderfer
- Sanmi Koyejo
- Mrinmaya Sachan
- Stella Biderman
- Zeerak Talat
- Avijit Ghosh
- Irene Solaiman