OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 22.03.2026, 01:22

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Performance of GPT-4o combined with retrieval-augmented generation on nutritionist licensing exam questions

2025·0 Zitationen·Endocrine JournalOpen Access
Volltext beim Verlag öffnen

0

Zitationen

9

Autoren

2025

Jahr

Abstract

GPT-4o, a general-purpose large language model, has a Retrieval-Augmented Variant (GPT-4o-RAG) that can assist in dietary counseling. However, research on its application in this field remains lacking. To bridge this gap, we used the Japanese National Examination for Registered Dietitians as a standardized benchmark for evaluation. Three language models-GPT-4o, GPT-4o-mini, and GPT-4o-RAG-were assessed using 599 publicly available multiple-choice questions from the 2022-2024 national examinations. For each model, we generated answers to each question five times and based our evaluation on these multiple outputs to assess response variability and robustness. A custom pipeline was implemented for GPT-4o-RAG to retrieve guideline-based documents for integration with GPT-generated responses. Accuracy rates, variance, and response consistency were evaluated. Term Frequency-Inverse Document Frequency analysis was conducted to compare word characteristics in correctly and incorrectly answered questions. All three models achieved accuracy rates >60%, the passing threshold. GPT-4o-RAG demonstrated the highest accuracy (83.5% ± 0.3%), followed by GPT-4o (82.1% ± 1.0%), and GPT-4o-mini (70.0% ± 1.4%). While the accuracy improvement of GPT-4o-RAG over GPT-4o was not statistically significant (p = 0.12), it exhibited significantly lower variance and higher response consistency (97.3% vs. 91.2-95.2%, p < 0.001). GPT-4o-RAG outperformed other models in applied and clinical nutrition categories but showed limited performance on numerical questions. Term Frequency-Inverse Document Frequency analysis suggested that incorrect answers were more frequently associated with numerical terms. GPT-4o-RAG improved response consistency and domain-specific performance, suggesting utility in clinical nutrition. However, limitations in numerical reasoning and individualized guidance warrant further development and validation.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Educational Assessment and PedagogyEducational Technology and AssessmentArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen