Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial Intelligence in Relation to Accurate Information and Tasks in Gynecologic Oncology and Clinical Medicine – Dunning-Kruger Effects and Ultracrepidarianism
1
Zitationen
6
Autoren
2025
Jahr
Abstract
We have published on the accuracy of the Google Virtual Assistant, Alexa, Siri, Cortana, Gemini and Copilot. Emerging from this published work was a focus on the ac-curacy of AI that could be determined through validations. In our work published in 2023, the accuracy of responses to a panel of 24 queries related to gynecologic oncology was low, with Google Virtual Assistant (VA) providing the most correct audible replies (18.1%), followed by Alexa (6.5%), Siri (5.5%), and Cortana (2.3%). In the months following our publication, there was explosive excitement about several generative AIs that continue to transform the landscape of information accessibility by presenting the search results in impressively engaging narratives. This type of presentation has been enabled by combining machine learning algorithms with Natural Language Processing (NLP). In 2024, we published our exploration of the generative AIs Gemini and Copilot as well as the Google Assistant in relation to how accurately they responded to the panel of 24 queries that we used in the 2023 publication. Google Gemini achieved an 87.5% accuracy rate, while the accuracy of Microsoft Copilot was 83.3%. In contrast, the Google VA’s accuracy in audible responses improved from 18% in the 2023 report to 63% in 2024. Because of our investigation in this area, we have examined the accuracy of results obtained through different AI models in this review. The landscape of the findings reviewed here surveyed 252 papers published in 2024, topically reporting on AI in medicine of which 83 articles are considered in the present review because they contain evidenced-based findings. In particular, the types of cases considered deal with AI accuracy in initial differential diagnoses, cancer treatment recommendations, board-style exams and performance in various clinical tasks. Importantly, summaries of the validation techniques used to evaluate AI findings are presented. This review focuses on those AIs that have a clinical relevancy evidenced by application and evaluation in clinical publications. This relevancy speaks to both what has been promised and what has been delivered by various AI systems. Readers will be able to understand when generative AI may be expressing views without having the necessary information (ultracrepidarianism) or is responding as if the generative AI had expert knowledge when it does not. Without an awareness that AIs may deliver inadequate or confabulated information, incorrect medical decisions and inappropriate clinical applications can result (Dunning-Kruger effect). As a result, in certain cases a generative AI might underperform and provide results which greatly overestimate any medical or clinical validity.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.