Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

HumanELY: Human evaluation of LLM yield, using a novel web-based evaluation tool

2023·20 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2023

Jahr

Abstract

A bstract Large language models (LLMs) have caught the imagination of researchers,developers and public in general the world over with their potential for transformation. Vast amounts of research and development resources are being provided to implement these models in all facets of life. Trained using billions of parameters, various measures of their accuracy and performance have been proposed and used in recent times. While many of the automated natural language assessment parameters measure LLM output performance for use of language, contextual outputs are still hard to measure and quantify. Hence, human evaluation is still an important measure of LLM performance,even though it has been applied variably and inconsistently due to lack of guidance and resource limitations. To provide a structured way to perform comprehensive human evaluation of LLM output, we propose the first guidance and tool called HumanELY. Our approach and tool built using prior knowledge helps perform evaluation of LLM outputs in a comprehensive, consistent, measurable and comparable manner. HumanELY comprises of five key evaluation metrics: relevance, coverage, coherence, harm and comparison. Additional submetrics within these five key metrics provide for Likert scale based human evaluation of LLM outputs. Our related webtool uses this HumanELY guidance to enable LLM evaluation and provide data for comparison against different users performing human evaluation. While all metrics may not be relevant and pertinent to all outputs, it is important to assess and address their use. Lastly, we demonstrate comparison of metrics used in HumanELY against some of the recent publications in the healthcare domain. We focused on the healthcare domain due to the need to demonstrate highest levels of accuracy and lowest levels of harm in a comprehensive manner. We anticipate our guidance and tool to be used for any domain where LLMs find an use case. Link to the HumanELY Tool https://www.brainxai.com/humanely

Autoren

Themen

Natural Language Processing TechniquesTopic ModelingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

HumanELY: Human evaluation of LLM yield, using a novel web-based evaluation tool

Abstract

Ähnliche Arbeiten

Autoren

Themen