José Hernández‐Orallo

314 Arbeiten8.159 Zitationen

Relevante Arbeiten

Meistzitierte Publikationen im Bereich Gesundheit & MedTech

Larger and more instructable language models become less reliable

2024 · 125 Zit. · Nature

Rethink reporting of evaluation results in AI

2023 · 82 Zit. · Science

AI Watch: Methodology to Monitor the Evolution of AI Technologies

2020 · 4 Zit. · RePEc: Research Papers in Economics

How General-Purpose Is a Language Model? Usefulness and Safety with Human Prompters in the Wild

2022 · 2 Zit. · Proceedings of the AAAI Conference on Artificial Intelligence

Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents

2025 · 1 Zit. · ArXiv.org

Conversational complexity for assessing risk in large language models

2025 · 1 Zit. · EPJ Data Science

The Association for the Advancement of Artificial Intelligence 2020 Workshop Program

2020 · 1 Zit. · AI Magazine

An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI

2023 · 1 Zit. · arXiv (Cornell University)

Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics

2025 · 0 Zit. · Journal of Medical Internet Research

Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

2025 · 0 Zit. · ArXiv.org