Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

M <scp>o</scp> N <scp>a</scp> C <scp>o</scp> : More Natural and Complex Questions for Reasoning Across Dozens of Documents

2026·0 Zitationen·Transactions of the Association for Computational LinguisticsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Automated agents, powered by large language models (LLMs), are emerging as the go-to tool for querying information. However, evaluation benchmarks for LLM agents rarely feature natural questions that are both information-seeking and genuinely time-consuming for humans. To address this gap we introduce MoNaCo, a benchmark of 1,315 natural and time-consuming questions that require dozens, and at times hundreds, of intermediate steps to solve— far more than any existing QA benchmark. To build MoNaCo, we developed a decomposed annotation pipeline to elicit and manually answer real-world time-consuming questions at scale. Frontier LLMs evaluated on MoNaCo achieve at most 61.2% F1, hampered by low recall and hallucinations. Our results underscore the limitations of LLM-powered agents in handling the complexity and sheer breadth of real-world information-seeking tasks—with MoNaCo providing an effective resource for tracking such progress. The MoNaCo benchmark, codebase, prompts, and models predictions are all publicly available at: https://tomerwolgithub.github.io/monaco.

Autoren

Institutionen

Themen

Topic ModelingMultimodal Machine Learning ApplicationsArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

M <scp>o</scp> N <scp>a</scp> C <scp>o</scp> : More Natural and Complex Questions for Reasoning Across Dozens of Documents

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen