OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 17.03.2026, 01:14

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Are You Asking GPT-4 Medical Questions Properly? - Prompt Engineering in Consistency and Reliability with Evidence-Based Guidelines for ChatGPT-4: A Pilot Study

2023·9 ZitationenOpen Access
Volltext beim Verlag öffnen

9

Zitationen

7

Autoren

2023

Jahr

Abstract

<title>Abstract</title> Background GPT-4 is a newly developed large language model that has been preliminarily applied in the medical field. However, GPT-4’s relevant theoretical knowledge of computer science has not been effectively transferred to the medical field. Objective To explore the application of prompt engineering in GPT-4 and to examine the reliability of GPT-4. Methods Different styles of prompts were designed and used to ask GPT-4 questions about agreement with the American Academy of Orthopaedic Surgeons (AAOS) osteoarthritis (OA) evidenced-based guidelines. Each question was asked 5 times. We compared the consistency with guidelines across different evidence levels for different prompts and assessed the reliability of different prompts by asking the same question 5 times. Results The ROT style had a significant performance for strong recommendations, with a total consistency of 77.5%, and showed steady performance at other levels of evidence compared to other prompts. The reliability of GPT-4 in different prompts was not stable (Fleiss kappa ranged from 0.334 to 0.525, and Kendall’s coefficient ranged from 0.701 to 0.814). Conclusions The application of prompt engineering could improve the performance of GPT-4 in medicine. The reliability of GPT-4 in answering medical questions is not clear, and further research is necessary.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareClinical Reasoning and Diagnostic Skills
Volltext beim Verlag öffnen