Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The accuracy and repeatability of OpenEvidence on complex medical subspecialty scenarios: a pilot study

2025·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Abstract OpenEvidence is a popular artificial intelligence (AI) based medical search engine that generates evidence-based answers. It includes a quick search engine method (OE) that takes only seconds to respond, along with a limited number of references. In mid-2025, the platform introduced “Deep Consult” (DC), which takes several minutes to respond and provides more comprehensive answers with additional references. OpenEvidence scored 100% on USMLE-type multiple-choice questions, but it has not been tested on more complex medical scenarios. We tested the OE and DC models using questions primarily derived from medical specialty board exams, specifically, the MedXpertQA dataset. In a prior published study, this dataset was evaluated with eleven large language models (LLMs), and the results indicated poor accuracy (14-46%) for all LLMs. We evaluated the performance of OpenEvidence on a sample of the MedXpertQA dataset, comprising 100 medical subspecialty scenarios and using two independent evaluators. The highest accuracy for DC was 41%, and for OE, 34%. Repeatability testing revealed an evaluator concordance rate of 77% for OE and 72% for DC.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMeta-analysis and systematic reviewsTopic Modeling

Volltext beim Verlag öffnen

The accuracy and repeatability of OpenEvidence on complex medical subspecialty scenarios: a pilot study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen