Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the Precision of AI-Generated Medical Answers: An Evaluation of LLaMA-3 Powered Meta AI
0
Zitationen
2
Autoren
2025
Jahr
Abstract
Background & Objective: Meta AI is being used frequently by medical students to clear the queries or to solve the questions, due to its easy availability. The purpose of this study is to assess the correctness of the Meta AI-generated answers to medical questions, and the reproducibility of the results. Method: The study employs an Evaluation Research Design aimed to assess the quality and effectiveness of Meta AI. A total of 240 MCQs were included in the questionnaire, 30 MCQs from each subject. Out of these, 108 were case-based (Category: A), whereas 132 were fact-based (Category: B). Meta AI was re-queried with the previously failed questions 14 days later. Results were analyzed manually and accuracies were evaluated using IBM SPSS version 27. Results: In initial analysis, Meta AI correctly answered 187 out of 240 questions (average accuracy = 77.9%). The most accurately responded category was Category-A with an average accuracy of 82.4%. The accuracy of Category-B was noted to be 74.2%. In re-scored analysis, Meta AI reproduced correct answers for only 12 out of 53 previously failed questions leading to an average reproducibility of 22.6%. The accuracy of Category A was 47.3% and that of Category B was 8.8%. Conclusion: The integration of AI in the field of medicine is advancing rapidly, and models like Meta AI represent significant strides in making medical information more accessible and accurate. Despite these promising results, there are notable limitations, such as the scope of questions, the subjects covered, and potential selection biases. Table 1. Overview of Total and Categorical Validation from Initial Analysis Subject Correct Answers Wrong Answers Category – A n (total) Category – B n (total) Accuracy % Anatomy 20 10 1 (8) 9 (22) 66.6 Biochemistry 26 4 3 (16) 1 (14) 86.6 Community Medicine 20 10 5 (10) 5 (20) 66.6 Forensic Medicine 18 12 0 (0) 12 (30) 60.0 Microbiology 27 3 1 (16) 2 (14) 90.0 Pathology 29 1 1 (29) 0 (1) 96.6 Pharmacology 26 4 3 (15) 1 (15) 86.6 Physiology 21 9 5 (14) 4 (16) 70.0 Total = 187 53 19 (108) 34 (132) 77.9 Table 2. Overview of Total and Categorical Validation from Re-scored Analysis Subject Total Questions* Correct Answers Wrong Answers Category – A n (total) Category – B n (total) Accuracy % Anatomy 10 1 9 0 (1) 9 (9) 10.0 Biochemistry 4 1 3 2 (3) 1 (1) 25.0 Community Medicine 10 3 7 3 (5) 4 (5) 30.0 Forensic Medicine 12 1 11 0 (0) 11 (12) 8.3 Microbiology 3 2 1 0 (1) 1 (2) 66.6 Pathology 1 0 1 1 (1) 0 (0) 0.0 Pharmacology 4 3 1 0 (3) 1 (1) 75.0 Physiology 9 1 8 4 (5) 4 (4) 11.1 Total = 53 12 41 10 (19) 31 (34) 22.6 Legend: Total questions refer to the questions that were answered incorrectly in first (initial) attempt
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.336 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.207 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.607 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.476 Zit.