José Hernández‐Orallo
Relevante Arbeiten
Meistzitierte Publikationen im Bereich Gesundheit & MedTech
Larger and more instructable language models become less reliable
2024 · 125 Zit. · Nature
Rethink reporting of evaluation results in AI
2023 · 82 Zit. · Science
AI Watch: Methodology to Monitor the Evolution of AI Technologies
2020 · 4 Zit. · RePEc: Research Papers in Economics
How General-Purpose Is a Language Model? Usefulness and Safety with Human Prompters in the Wild
2022 · 2 Zit. · Proceedings of the AAAI Conference on Artificial Intelligence
Measuring Data Science Automation: A Survey of Evaluation Tools for AI Assistants and Agents
2025 · 1 Zit. · ArXiv.org
Conversational complexity for assessing risk in large language models
2025 · 1 Zit. · EPJ Data Science
The Association for the Advancement of Artificial Intelligence 2020 Workshop Program
2020 · 1 Zit. · AI Magazine
An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
2023 · 1 Zit. · arXiv (Cornell University)
Beyond Benchmarks: Evaluating Generalist Medical Artificial Intelligence With Psychometrics
2025 · 0 Zit. · Journal of Medical Internet Research
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia
2025 · 0 Zit. · ArXiv.org