Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
The accuracy and repeatability of OpenEvidence on complex medical subspecialty scenarios: a pilot study
0
Zitationen
4
Autoren
2025
Jahr
Abstract
Abstract OpenEvidence is a popular artificial intelligence (AI) based medical search engine that generates evidence-based answers. It includes a quick search engine method (OE) that takes only seconds to respond, along with a limited number of references. In mid-2025, the platform introduced “Deep Consult” (DC), which takes several minutes to respond and provides more comprehensive answers with additional references. OpenEvidence scored 100% on USMLE-type multiple-choice questions, but it has not been tested on more complex medical scenarios. We tested the OE and DC models using questions primarily derived from medical specialty board exams, specifically, the MedXpertQA dataset. In a prior published study, this dataset was evaluated with eleven large language models (LLMs), and the results indicated poor accuracy (14-46%) for all LLMs. We evaluated the performance of OpenEvidence on a sample of the MedXpertQA dataset, comprising 100 medical subspecialty scenarios and using two independent evaluators. The highest accuracy for DC was 41%, and for OE, 34%. Repeatability testing revealed an evaluator concordance rate of 77% for OE and 72% for DC.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.245 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.102 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.468 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.429 Zit.