Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Valutazione one-shot di Mistral7B sul nuovo benchmark EuropeMedQA

2025·0 Zitationen·Recenti Progressi in Medicina

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Artificial intelligence (AI) adoption in healthcare is rising. Unbiased evaluation requires uncontaminated benchmarks. We evaluated Mistral-7B-Instruct-v0.1 on 1120 human-validated Italian medical multiple-choice questions (SSM). Mistral achieved 40,2% accuracy and 38.8% F1 score on the dataset. Likely causes include English-centric instruction tuning, lack of medical domain knowledge, and prompt misalignment with the task format. These findings suggest that LLMs need further improvements before deployment.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareClinical Reasoning and Diagnostic Skills

Volltext beim Verlag öffnen

Valutazione one-shot di Mistral7B sul nuovo benchmark EuropeMedQA

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen