Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Structured taxonomy and framework for developing medical benchmark in large language models derived from scoping review

2026·0 Zitationen·npj Digital MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

With the rapid advancement of large language model technology, numerous studies have explored its application in the medical field. Robust evaluation is crucial for ensuring reliability and safety, leading to the development of diverse benchmark datasets. In this study, we propose a structured taxonomy to provide researchers with practical guidance for benchmark selection. Furthermore, we introduce READY, a development framework built on five principles - Reliable, Ethical, Annotated, Diverse, Yield-validated - to support the systematic design of medical benchmarks and strengthen future evaluation practices. To establish the taxonomy and framework, we systematically reviewed benchmark datasets designed for evaluating LLMs in medical context. A comprehensive literature search yielded 55 relevant studies. Each benchmark was analyzed using a structured framework encompassing the dataset construction and evaluation methodology. To assess the applicability of the proposed framework, five domain experts independently applied the READY framework to benchmark studies, demonstrating consistent inter-rater agreement. We anticipate that this research will promote more rigorous and ethical LLM evaluation, paving the way for the safe application of LLMs in clinical settings.

Autoren

Institutionen

Themen

Machine Learning in HealthcareBiomedical Text Mining and OntologiesArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Structured taxonomy and framework for developing medical benchmark in large language models derived from scoping review

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen