Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Standardized Assessment Framework for Evaluations of Large Language Models in Medicine (SAFE-LLM)

2025·2 Zitationen·Preprints.orgOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large language models (LLMs) are AI-powered systems that have demonstrated significant potential in various fields, including medicine. Despite their promise, the methods for evaluating their performance in medical contexts remain inconsistent. This paper introduces the Standardized Assessment Framework for Evaluations of Large Language Models (SAFE-LLM) to streamline and standardize the evaluation of LLMs in healthcare. SAFE-LLM assesses five domains: accuracy, comprehensiveness, supplementation, consistency, and fluency. Accuracy refers to the correctness of the model's response, comprehensiveness to the detail and reasoning provided, supplementation to additional relevant information, consistency to uniformity in repeated answers, and fluency to the coherence of responses. Each prompt is given three times, with responses evaluated by two independent experts. Discrepancies between evaluations trigger a third assessment to ensure reliability. Grading is performed on a scale specific to each domain, with a maximum possible score of seven points. The SAFE-LLM score can be applied to individual answers or averaged across responses for a holistic assessment. This framework aims to unify evaluation standards, facilitating the comparison and improvement of LLMs in medical applications. Developing standardized evaluation tools like SAFE-LLM is critical for integrating AI into healthcare effectively. This framework is a preliminary step towards more rigorous and comparable assessments of LLMs, enhancing their applicability and trustworthiness in medical settings.

Autoren

Institutionen

Janbazan Medical and Engineering Research Center(IR)

Themen

Radiomics and Machine Learning in Medical ImagingBiomedical Text Mining and OntologiesArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Standardized Assessment Framework for Evaluations of Large Language Models in Medicine (SAFE-LLM)

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen