OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.04.2026, 07:26

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Arkangel AI: A conversational agent for real-time, evidence-based medical question-answering

2025·1 Zitationen·Intelligence-Based MedicineOpen Access
Volltext beim Verlag öffnen

1

Zitationen

7

Autoren

2025

Jahr

Abstract

Large Language Models (LLMs) have been trained and tested on several medical question-answering (QA) datasets built from medical licensing exams and natural interactions between doctors and patients to fine-tune them for specific health-related tasks We aimed to develop LLM-powered Conversational Agents (CAs) equipped to produce fast, accurate, and real-time responses to medical queries in different clinical and scientific scenarios. This paper presents MedSearch, our first conversational agent and research assistant. The model is based on a system containing five LLMs; each is classified within a specific workflow with pre-defined instructions to produce the best search strategy and provide evidence-based answers. We assessed accuracy, intra/inter-class variability, and Cohen’s Kappa using the question-answer (QA) dataset MedQA. Additionally, we used the PubMedQA dataset and assessed both databases using the RAGAS framework, including Context, Response Relevance, and Faithfulness. Traditional statistical analysis was performed with hypothesis tests and 95% IC. Accuracy for MedQA (n: 1273) was 90.26% and Cohen’s kappa was 87%, surpassing current SoTAs for other LLMs (GPT-4o, MedPaLM2). The model retrieved 80% of the expected articles and provided relevant answers in 82% of PubMedQA. MedSearch showed proficient retrieval and reasoning abilities and unbiased responses. Evenly distributed medical QA datasets to train improved LLMs and external validation for the model with real-world physicians in clinical scenarios are needed. Clinical decision-making remains in the hands of trained healthcare professionals. • We built a 5 LLM pipeline using retrieved and background information that achieved the best accuracy. • MedSearch achieves state-of-the-art performance on the MedQA dataset. • MedSearch shows exemplary comprehension, reasoning and retrieval abilities on various medical topics. • Despite good performance final decisions must be made by human medical experts.

Ähnliche Arbeiten

Autoren

Themen

Topic ModelingAI in Service InteractionsArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen