Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

How do ChatGPT and other generative artificial intelligence models perform on foot and ankle questions from the Brazilian Orthopedics and Traumatology Association’s TEOT and TARO exams? The implications of large language models for medical education

2026·0 Zitationen·Journal of the Foot & AnkleOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Introduction: Generative artificial intelligence (AI) is increasingly used for study and rapid consultation. We assessed how leading large language models (LLMs) perform on Brazilian Orthopedics and Traumatology Association (SBOT) Foot and Ankle exam questions. Methods: Cross-sectional benchmarking of 107 foot and ankle questions from TEOT and TARO exams. Items were classified into the following categories: adult trauma, pediatric trauma, anatomy/imaging, physical examination, congenital/pediatric disorders, and adult disorders. Four generative AI models were queried with standardized prompts; responses were scored against the official key. Outcome: overall accuracy. Results: ChatGPT (GPT-5 Thinking) had the highest accuracy (86.91%), followed by Gemini (79.43%). Accuracy differed by domain, with lower performance in pediatric trauma and congenital disorders. No model achieved perfect agreement with the key. Conclusions: Popular generative AI models performed well on SBOT foot and ankle exam questions, with ChatGPT (GPT-5 Thinking) scoring highest. LLMs may be helpful adjuncts in residency education when used with supervision and critical appraisal.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsUltrasound in Clinical Applications

Volltext beim Verlag öffnen

How do ChatGPT and other generative artificial intelligence models perform on foot and ankle questions from the Brazilian Orthopedics and Traumatology Association’s TEOT and TARO exams? The implications of large language models for medical education

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen