Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz

2024·0 Zitationen·arXiv (Cornell University)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

This research introduces a novel evaluation framework designed to assess large language models' (LLMs) ability to acknowledge uncertainty on 675 fundamentally unsolvable problems. Using a curated dataset of graduate-level grand challenge questions with intentionally unknowable answers, we evaluated twelve state-of-the-art LLMs, including both open and closed-source models, on their propensity to admit ignorance rather than generate plausible but incorrect responses. The best models scored in 62-68% accuracy ranges for admitting the problem solution was unknown in fields ranging from biology to philosophy and mathematics. We observed an inverse relationship between problem difficulty and model accuracy, with GPT-4 demonstrating higher rates of uncertainty acknowledgment on more challenging problems (35.8%) compared to simpler ones (20.0%). This pattern indicates that models may be more prone to generate speculative answers when problems appear more tractable. The study also revealed significant variations across problem categories, with models showing difficulty in acknowledging uncertainty in invention and NP-hard problems while performing relatively better on philosophical and psychological challenges. These results contribute to the growing body of research on artificial general intelligence (AGI) assessment by highlighting the importance of uncertainty recognition as a critical component of future machine intelligence evaluation. This impossibility test thus extends previous theoretical frameworks for universal intelligence testing by providing empirical evidence of current limitations in LLMs' ability to recognize their own knowledge boundaries, suggesting new directions for improving model training architectures and evaluation approaches.

Autoren

Themen

Explainable Artificial Intelligence (XAI)Artificial Intelligence in Healthcare and EducationEthics and Social Impacts of AI

Volltext beim Verlag öffnen

The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz

Abstract

Ähnliche Arbeiten

Autoren

Themen