Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology
23
Zitationen
25
Autoren
2024
Jahr
Abstract
Objective: This study investigated the ability of Large Language Models (LLMs) to provide accurate and consistent answers by focusing on their performance in complex gynecologic cancer cases. Background: LLMs are advancing rapidly and require a thorough evaluation to ensure that they can be safely and effectively used in clinical decision-making. Such evaluations are essential for confirming LLM reliability and accuracy in supporting medical professionals in casework. Study design: We assessed three prominent LLMs-ChatGPT-4 (CG-4), Gemini Advanced (GemAdv), and Copilot-evaluating their accuracy, consistency, and overall performance. Fifteen clinical vignettes of varying difficulty and five open-ended questions based on real patient cases were used. The responses were coded, randomized, and evaluated blindly by six expert gynecologic oncologists using a 5-point Likert scale for relevance, clarity, depth, focus, and coherence. Results: GemAdv demonstrated superior accuracy (81.87 %) compared to both CG-4 (61.60 %) and Copilot (70.67 %) across all difficulty levels. GemAdv consistently provided correct answers more frequently (>60 % every day during the testing period). Although CG-4 showed a slight advantage in adhering to the National Comprehensive Cancer Network (NCCN) treatment guidelines, GemAdv excelled in the depth and focus of the answers provided, which are crucial aspects of clinical decision-making. Conclusion: LLMs, especially GemAdv, show potential in supporting clinical practice by providing accurate, consistent, and relevant information for gynecologic cancer. However, further refinement is needed for more complex scenarios. This study highlights the promise of LLMs in gynecologic oncology, emphasizing the need for ongoing development and rigorous evaluation to maximize their clinical utility and reliability.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.545 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.436 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.935 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.589 Zit.
Autoren
- Khanisyah Erza Gumilar
- Birama Robby Indraprasta
- Ach Salman Faridzi
- Bagus Mukti Wibowo
- Aditya Herlambang
- Eccita Rahestyningtyas
- Budi Irawan
- Zulkarnain Tambunan
- Ahmad Fadhli Bustomi
- Bagus Ngurah Brahmantara
- Zih-Ying Yu
- Yu-Cheng Hsu
- Herlangga Pramuditya
- Very Great Eka Putra
- Hari Nugroho
- Pungky Mulawardhana
- Brahmana Askandar Tjokroprawiro
- Tri Hedianto
- Ibrahim H. Ibrahim
- Jingshan Huang
- Dongqi Li
- Chien‐Hsing Lu
- Jer‐Yen Yang
- Li-Na Liao
- Ming Jen Tan