Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Enhancing Large Language Models (LLM) Performance in Nephrology through Prompt Engineering: A Comparative Analysis of ChatGPT-4 Responses in Answering AKI and Critical Care Nephrology Questions
0
Zitationen
7
Autoren
2024
Jahr
Abstract
Background: Large Language Models (LLMs) have significantly advanced the field of artificial intelligence (AI). The effectiveness of LLMs is substantially influenced by the structure and formulation of input queries, a process known as prompt engineering. Prompt engineering techniques, such as the chain of thought approach, which involves thinking through problems step by step, have shown promising accuracy compared to regular prompts. This study investigates the impact of the chain of thought approach on the accuracy of ChatGPT-4 in addressing acute kidney injury (AKI) and critical care nephrology questions. Methods: We presented ChatGPT-4 with 101 questions from the Kidney Self-Assessment Program (KSAP) and Nephrology Self-Assessment Program (NephSAP). We employed two prompting methods: one using the original question and the other utilizing the chain of thought approach. The McNemar test was used to assess differences in accuracy, while Cohen's kappa was employed to evaluate agreement between the two prompting methods. Results: ChatGPT-4 demonstrated an accuracy of 87.1% with chain of thought prompting, outperforming the 81.2% accuracy achieved with regular prompting (P=0.15). The kappa statistic for the responses provided by the two prompts is 0.80. Consistency between the two methods was observed in 84.2% of the questions, with 78.2% being correctly answered by both methods. Chain of thought prompting correctly answered nine questions that were missed under regular prompting. Among the thirteen questions missed under chain of thought prompting, a notable 76.9% were repeated errors from regular prompting. Only three questions incorrectly answered with the chain of thought prompting were correct under regular prompting. Conclusion: The chain of thought approach improves ChatGPT-4's accuracy in addressing nephrology-related questions compared to regular prompting, although the difference is not statistically significant. These findings emphasize the importance of developing effective prompting strategies to optimize the application of LLMs in clinical decision support. Future research should aim to generalize these findings across different medical specialties to maximize the benefits of LLMs in clinical decision-making.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.316 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.177 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.575 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.468 Zit.