Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the Proficiency of Large Language Models on Funduscopic Disease Knowledge (Preprint)
0
Zitationen
11
Autoren
2024
Jahr
Abstract
<sec> <title>BACKGROUND</title> Large language models (LLMs) have significantly transformed the field of natural language processing, with cutting-edge models like ChatGPT currently leading the way in medical AI. </sec> <sec> <title>OBJECTIVE</title> This study aimed to assess the performance of five distinct LLMs (GPT-3.5, ChatGPT-4, PaLM2, Claude 2, and SenseNova) in comparison to two human cohorts (a group of funduscopic disease experts and a group of ophthalmologists) on the specialized subject of funduscopic disease. </sec> <sec> <title>METHODS</title> Five distinct LLMs and two distinct human groups independently completed a 100-item funduscopic disease test. The performance of these entities was assessed by comparing their average scores, response stability, and answer confidence, thereby establishing a basis for evaluation. </sec> <sec> <title>RESULTS</title> Among all the LLMs, GPT-4 and PaLM2 exhibited the most substantial average correlation. Additionally, GPT-4 achieved the highest average score and demonstrated the utmost confidence during the exam. In comparison to human cohorts, GPT-4 exhibited comparable performance to ophthalmologists, albeit falling short of the expertise demonstrated by funduscopic disease specialists. </sec> <sec> <title>CONCLUSIONS</title> The study provided evidence of the exceptional performance of GPT-4 in the domain of funduscopic disease. With continued enhancements, validated LLMs have the potential to yield unforeseen advantages in enhancing healthcare for both patients and physicians. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.214 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.071 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.429 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.418 Zit.