Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Benchmarking generative AI tools for interpretation of the WHO TB mutation catalogue
0
Zitationen
4
Autoren
2026
Jahr
Abstract
The World Health Organization (WHO) 2023 Mutation Catalogue for <i>Mycobacterium tuberculosis</i> is a crucial knowledgebase and tool for clinical interpretation of mutations associated with drug-resistant TB. However, the document's complexity and size pose challenges for many users. This study evaluated the potential of generative artificial intelligence (AI) models to facilitate natural language user interaction with the catalogue. This was a benchmarking study, not a clinical usability trial. Four prominent AI models-Google Gemini 2.5 Pro, OpenAI ChatGPT 4.1, Perplexity AI, and DeepSeek R1-were assessed through general test questions, mutation search and retrieval tasks using both full catalogue queries and antibiotic-specific tables, and the application of additional grading rules to score novel mutations. Performance was measured based on accuracy, completeness, clarity, source citation, and the presence of hallucinations. Google Gemini 2.5 Pro consistently demonstrated superior performance in accuracy, completeness, and avoidance of hallucinations across most evaluations, especially in general queries and large dataset searches. DeepSeek R1 excelled in applying grading rules to novel mutations and showed high accuracy in focused datasets, but exhibited some hallucinations. ChatGPT 4.1 was strong in clarity but lacked proper citations, and Perplexity AI showed variable performance with a higher frequency of hallucinations. The findings highlight the potential of AI tools to enhance the accessibility of complex knowledgebases like the WHO Mutation Catalogue, while emphasizing the need for rigorous benchmarking. While no model is yet suitable for direct clinical use, the results suggest that with further development, models like Google Gemini 2.5 Pro could form the basis of a custom AI agent to assist users in navigating this critical resource, ultimately contributing to improved TB control efforts.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.200 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.051 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.416 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.410 Zit.