Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Entity-centric evaluation of large language model responses for medical question-answering tasks

2025·0 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Objective: Develop a metric for evaluating the clinical alignment and informativeness of large language model (LLM)-generated responses in medical question-answering (QA) tasks. Materials and methods: We propose EntQA, an entity-centric metric that extracts biomedical entities from patient backgrounds, diagnostic questions and LLM responses using a biomedical named entity recognition model, followed by de-duplication and semantic/lexical matching with thresholds. We computed recall-style coverage scores to quantify entity retention and detect omissions without external resources. We evaluated EntQA on five benchmarks using seven Qwen 2.5 Instruct models (0.5B-72B parameters), comparing it to baselines via Spearman/Kendall correlations with model accuracy at group level, point-biserial correlations at case level, and Spearman correlations with model scaling. Results: EntQA demonstrated consistent positive alignments with accuracy (group-level Spearman up to 0.9286; case-level point-biserial up to 0.0926) and model scaling (Spearman up to 0.252), outperforming baselines which often showed negative or inconsistent correlations (e.g., BERTScore Spearman -0.9286 with accuracy). Conclusion: EntQA offers a scalable, interpretable evaluation for LLM medical QA, outperforming traditional metrics in capturing clinical fidelity and supporting trustworthy healthcare AI through applications in fact-checking and model refinement.

Autoren

Institutionen

Boston University(US)

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationMachine Learning in Healthcare

Volltext beim Verlag öffnen

Entity-centric evaluation of large language model responses for medical question-answering tasks

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen