Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Dissecting HealthBench: Disease Spectrum, Clinical Diversity, and Data Insights from Multi-Turn Clinical AI Evaluation Benchmark
2
Zitationen
2
Autoren
2025
Jahr
Abstract
HealthBench is an open-source, large-scale benchmark consisting of 5,000 multi-turn clinical conversations evaluated against 48,562 criteria developed by clinicians. Recognized as a significant advancement in assessing realistic artificial intelligence (AI) models, HealthBench deserves further exploration. In this article, we systematically analyze the benchmark's disease spectrum, diagnostic and therapeutic focuses, and demographic diversity. We evaluate its representativeness and strengths, as well as the essential limitations that AI researchers and clinicians should consider when using it for realistic model evaluations.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.391 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.721 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.261 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.695 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.436 Zit.