Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Towards Safe and Trustworthy Healthcare AI: Risk Assessment of Medical Dialogue Using LLMs

2025·0 Zitationen·Human-Centric Intelligent SystemsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Large Language Models (LLMs) are increasingly used in healthcare settings, yet concerns remain regarding their ability to safely and reliably handle medical dialogues. To address this issue, this study introduces a quantitative framework for evaluating the safety and trustworthiness of LLMs in multilingual medical dialogues. Using the German subtask of the NTCIR-18 MedNLP-CHAT dataset, we examined how 13 LLMs—encompassing general-purpose, open-source, and biomedical variants—identify medical, ethical, and legal risks. ROC–AUC–based statistical validation (one-sample t-tests and sign tests) were applied to ensure robust and reproducible evaluation. Results show that gpt-5, gpt-4o, gpt-3.5-Turbo, gpt-oss:120b, gpt-oss:20b, and gemma-3:27b consistently achieved reliable performance, while smaller and domain-specific models often failed to generalize across languages and risk types. These findings suggest that model scale and multi-domain safety alignment are key to achieving trustworthy risk reasoning in clinical dialogues and provide guidance for conservative deployment of LLMs in healthcare.

Autoren

Hiroki Tanioka

Institutionen

Tokushima University(JP)

Themen

Artificial Intelligence in Healthcare and EducationGenomics and Rare DiseasesMachine Learning in Healthcare

Volltext beim Verlag öffnen

Towards Safe and Trustworthy Healthcare AI: Risk Assessment of Medical Dialogue Using LLMs

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen