Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
How do ChatGPT and other generative artificial intelligence models perform on foot and ankle questions from the Brazilian Orthopedics and Traumatology Association’s TEOT and TARO exams? The implications of large language models for medical education
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Introduction: Generative artificial intelligence (AI) is increasingly used for study and rapid consultation. We assessed how leading large language models (LLMs) perform on Brazilian Orthopedics and Traumatology Association (SBOT) Foot and Ankle exam questions. Methods: Cross-sectional benchmarking of 107 foot and ankle questions from TEOT and TARO exams. Items were classified into the following categories: adult trauma, pediatric trauma, anatomy/imaging, physical examination, congenital/pediatric disorders, and adult disorders. Four generative AI models were queried with standardized prompts; responses were scored against the official key. Outcome: overall accuracy. Results: ChatGPT (GPT-5 Thinking) had the highest accuracy (86.91%), followed by Gemini (79.43%). Accuracy differed by domain, with lower performance in pediatric trauma and congenital disorders. No model achieved perfect agreement with the key. Conclusions: Popular generative AI models performed well on SBOT foot and ankle exam questions, with ChatGPT (GPT-5 Thinking) scoring highest. LLMs may be helpful adjuncts in residency education when used with supervision and critical appraisal.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.557 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.447 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.944 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.797 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.