OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.03.2026, 03:51

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Benchmarking generative AI tools for interpretation of the WHO TB mutation catalogue

2026·0 Zitationen·BMC Digital HealthOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2026

Jahr

Abstract

The World Health Organization (WHO) 2023 Mutation Catalogue for <i>Mycobacterium tuberculosis</i> is a crucial knowledgebase and tool for clinical interpretation of mutations associated with drug-resistant TB. However, the document's complexity and size pose challenges for many users. This study evaluated the potential of generative artificial intelligence (AI) models to facilitate natural language user interaction with the catalogue. This was a benchmarking study, not a clinical usability trial. Four prominent AI models-Google Gemini 2.5 Pro, OpenAI ChatGPT 4.1, Perplexity AI, and DeepSeek R1-were assessed through general test questions, mutation search and retrieval tasks using both full catalogue queries and antibiotic-specific tables, and the application of additional grading rules to score novel mutations. Performance was measured based on accuracy, completeness, clarity, source citation, and the presence of hallucinations. Google Gemini 2.5 Pro consistently demonstrated superior performance in accuracy, completeness, and avoidance of hallucinations across most evaluations, especially in general queries and large dataset searches. DeepSeek R1 excelled in applying grading rules to novel mutations and showed high accuracy in focused datasets, but exhibited some hallucinations. ChatGPT 4.1 was strong in clarity but lacked proper citations, and Perplexity AI showed variable performance with a higher frequency of hallucinations. The findings highlight the potential of AI tools to enhance the accessibility of complex knowledgebases like the WHO Mutation Catalogue, while emphasizing the need for rigorous benchmarking. While no model is yet suitable for direct clinical use, the results suggest that with further development, models like Google Gemini 2.5 Pro could form the basis of a custom AI agent to assist users in navigating this critical resource, ultimately contributing to improved TB control efforts.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationGenomics and Rare DiseasesMachine Learning in Healthcare
Volltext beim Verlag öffnen