OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 09:48

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

<scp>AI</scp>'s ability to interpret unlabeled anatomy images and supplement educational research as an <scp>AI</scp> rater

2025·1 Zitationen·Anatomical Sciences EducationOpen Access
Volltext beim Verlag öffnen

1

Zitationen

10

Autoren

2025

Jahr

Abstract

Evidence suggests custom chatbots are superior to commercial generative artificial intelligence (GenAI) systems for text-based anatomy content inquiries. This study evaluates ChatGPT-4o's and Claude 3.5 Sonnet's capabilities to interpret unlabeled anatomical images. Secondarily, ChatGPT o1-preview was evaluated as an AI rater to grade AI-generated outputs using a rubric and was compared against human raters. Anatomical images (five musculoskeletal, five thoracic) representing diverse image-based media (e.g., illustrations, photographs, MRI) were annotated with identification markers (e.g., arrows, circles) and uploaded to each GenAI system for interpretation. Forty-five prompts (i.e., 15 first-order, 15 second-order, and 15 third-order questions) with associated images were submitted to both GenAI systems across two timepoints. Responses were graded by anatomy experts for factual accuracy and superfluity (the presence of excessive wording) on a three-point Likert scale. ChatGPT o1-preview was tested for agreement against human anatomy experts to determine its usefulness as an AI rater. Statistical analyses included inter-rater agreement, hierarchical linear modeling, and test-retest reliability. ChatGPT-4o's factual accuracy score across 45 outputs was 68.0% compared to Claude 3.5 Sonnet's score of 61.5% (p = 0.319). As an AI rater, ChatGPT o1-preview showed moderate to substantial agreement with human raters (Cohen's kappa = 0.545-0.755) for evaluating factual accuracy according to a rubric of textbook answers. Further improvements and evaluations are needed before commercial GenAI systems can be used as credible student resources in anatomy education. Similarly, ChatGPT o1-preview demonstrates promise as an AI assistant for educational research, though further investigation is warranted.

Ähnliche Arbeiten