Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator

2024·5 Zitationen·arXiv (Cornell University)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

Artificial intelligence has significantly advanced healthcare, particularly through large language models (LLMs) that excel in medical question answering benchmarks. However, their real-world clinical application remains limited due to the complexities of doctor-patient interactions. To address this, we introduce \textbf{AI Hospital}, a multi-agent framework simulating dynamic medical interactions between \emph{Doctor} as player and NPCs including \emph{Patient}, \emph{Examiner}, \emph{Chief Physician}. This setup allows for realistic assessments of LLMs in clinical scenarios. We develop the Multi-View Medical Evaluation (MVME) benchmark, utilizing high-quality Chinese medical records and NPCs to evaluate LLMs' performance in symptom collection, examination recommendations, and diagnoses. Additionally, a dispute resolution collaborative mechanism is proposed to enhance diagnostic accuracy through iterative discussions. Despite improvements, current LLMs exhibit significant performance gaps in multi-turn interactions compared to one-step approaches. Our findings highlight the need for further research to bridge these gaps and improve LLMs' clinical diagnostic capabilities. Our data, code, and experimental results are all open-sourced at \url{https://github.com/LibertFan/AI_Hospital}.

Autoren

Themen

Artificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator

Abstract

Ähnliche Arbeiten

Autoren

Themen