OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 29.04.2026, 08:16

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating Large Language Models on Aerospace Medicine Principles

2025·1 Zitationen·Wilderness and Environmental Medicine
Volltext beim Verlag öffnen

1

Zitationen

4

Autoren

2025

Jahr

Abstract

, ChatGPT-4 had a mean reader score from 4.23 to 5.00 (Likert scale of 1-5) across chapters, whereas Gemini Advanced and the RAG LLM scored 3.30 to 4.91 and 4.69 to 5.00, respectively. When queried with 20 multiple-choice aerospace medicine board questions provided by the American College of Preventive Medicine, ChatGPT-4 and Gemini Advanced responded correctly 70% and 55% of the time, respectively, while the RAG LLM answered 85% correctly. Despite this quantitative measure of high performance, the LLMs tested still exhibited gaps in factual knowledge that potentially could be harmful, a degree of clinical reasoning that may not pass the aerospace medicine board exam, and some inconsistency when answering self-generated questions.ConclusionThere is considerable promise for LLM use in autonomous medical operations in spaceflight given the anticipated continued rapid pace of development, including advancements in model training, data quality, and fine-tuning methods.

Ähnliche Arbeiten