Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Enhancing Large Language Models (LLM) Performance in Nephrology through Prompt Engineering: A Comparative Analysis of ChatGPT-4 Responses in Answering AKI and Critical Care Nephrology Questions

2024·0 Zitationen·Journal of the American Society of Nephrology

Volltext beim Verlag öffnen

Zitationen

Autoren

2024

Jahr

Abstract

Background: Large Language Models (LLMs) have significantly advanced the field of artificial intelligence (AI). The effectiveness of LLMs is substantially influenced by the structure and formulation of input queries, a process known as prompt engineering. Prompt engineering techniques, such as the chain of thought approach, which involves thinking through problems step by step, have shown promising accuracy compared to regular prompts. This study investigates the impact of the chain of thought approach on the accuracy of ChatGPT-4 in addressing acute kidney injury (AKI) and critical care nephrology questions. Methods: We presented ChatGPT-4 with 101 questions from the Kidney Self-Assessment Program (KSAP) and Nephrology Self-Assessment Program (NephSAP). We employed two prompting methods: one using the original question and the other utilizing the chain of thought approach. The McNemar test was used to assess differences in accuracy, while Cohen's kappa was employed to evaluate agreement between the two prompting methods. Results: ChatGPT-4 demonstrated an accuracy of 87.1% with chain of thought prompting, outperforming the 81.2% accuracy achieved with regular prompting (P=0.15). The kappa statistic for the responses provided by the two prompts is 0.80. Consistency between the two methods was observed in 84.2% of the questions, with 78.2% being correctly answered by both methods. Chain of thought prompting correctly answered nine questions that were missed under regular prompting. Among the thirteen questions missed under chain of thought prompting, a notable 76.9% were repeated errors from regular prompting. Only three questions incorrectly answered with the chain of thought prompting were correct under regular prompting. Conclusion: The chain of thought approach improves ChatGPT-4's accuracy in addressing nephrology-related questions compared to regular prompting, although the difference is not statistically significant. These findings emphasize the importance of developing effective prompting strategies to optimize the application of LLMs in clinical decision support. Future research should aim to generalize these findings across different medical specialties to maximize the benefits of LLMs in clinical decision-making.

Autoren

Institutionen

Mayo Clinic(US)

Themen

Artificial Intelligence in Healthcare and EducationTopic ModelingMachine Learning in Healthcare

Volltext beim Verlag öffnen

Enhancing Large Language Models (LLM) Performance in Nephrology through Prompt Engineering: A Comparative Analysis of ChatGPT-4 Responses in Answering AKI and Critical Care Nephrology Questions

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen