Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing Large Language Models for Medical Question Answering in Portuguese: Open-Source Versus Closed-Source Approaches
1
Zitationen
1
Autoren
2025
Jahr
Abstract
Large language models (LLMs) show promise in medical knowledge assessment. This study benchmarked a closed-source (GPT-4o, OpenAI, San Francisco, CA) and an open-source (LLaMA 3.1 405B, Meta AI, Menlo Park, CA) LLM on 148 multiple-choice questions from the 2023 Portuguese National Residency Access Examination across five clinical domains. Using five distinct prompting strategies, models provided single-best-answer predictions. GPT-4o consistently outperformed LLaMA 3.1 by 7-11% accuracy across all prompts. Chain-of-thought prompting yielded the highest numerical accuracy for GPT-4o, though this improvement was not statistically significant over simpler prompts in post-hoc analyses, while offering minimal benefit when applied to LLaMA 3.1. Both models performed best in pediatrics and less accurately in surgery and psychiatry questions. Bias assessment indicated GPT-4o aligned well with correct answer distributions, unlike LLaMA 3.1, which showed prompt-dependent skew. Closed-source models currently demonstrate higher accuracy on Portuguese medical questions, likely due to extensive training. However, open-source models remain valuable for data control, though domain-focused fine-tuning may be needed for optimal performance in high-stakes applications.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.292 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.143 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.539 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.452 Zit.