Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
How much Medical Knowledge do LLMs have? An Evaluation of Medical Knowledge Coverage for LLMs
3
Zitationen
4
Autoren
2025
Jahr
Abstract
Previous evaluation frameworks for large language models (LLMs) have mostly relied on existing question-answering benchmarks, which are primarily task-oriented rather than knowledge-oriented.In the medical domain, however, the effective deployment of LLMs necessitates a thorough evaluation of their medical knowledge coverage.To this end, we propose a systematic evaluation framework, MedKGEval, to assess the coverage of medical knowledge in LLMs through the lens of medical knowledge graphs (KGs).MedKGEval transforms various levels of knowledge (entity-level, relation-level, and subgraph-level) from the medical KG into distinct groups of question-answer pairs, which serve as comprehensive evaluation benchmarks.In addition to traditional task-oriented evaluations, MedKGEval introduces a novel knowledge-oriented evaluation approach that encompasses the assessment of knowledge coverage across entities, relations, and triples.This multi-aspect evaluation approach allows for a more nuanced understanding of LLMs' knowledge coverage in the medical context.Using these benchmarks, we conduct a systematic evaluation of 11 LLMs from multiple perspectives, revealing insights into their strengths and weaknesses in medical knowledge memorization and reasoning.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.553 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.444 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.943 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.