Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Reply to: Domain‐Specific LLMS in Clinical Medicine: Identifying Preoperative Frailty From Clinical Notes
0
Zitationen
2
Autoren
2025
Jahr
Abstract
We thank Üçdal et al. for their thoughtful letter [1] advocating for the development and use of domain-specific large language models (LLMs) in healthcare in reference to our recent publication on the use of LLMs for identifying preoperative frailty among older adults using clinical notes [2]. They make interesting and valid points on how to ensure the use of artificial intelligence (AI) in medicine is accurate, applicable, and transparent, just like any clinical tools that are developed and become widely used to evaluate and treat patients. We agree that identifying or building tools that specifically excel in clinical applications will be key in the future of AI as clinical tools. General-purpose LLMs, while powerful, may fall short in the contextual understanding of medical text and handling of unique clinical language used in medicine. These categories of language models are typically trained on broad internet corpora that include only a small fraction of biomedical literature, electronic health records, and guideline-based knowledge. As a result, they may generate fluent but factually incorrect answers (e.g., hallucinations)—a phenomenon that is particularly problematic when applied to high-stakes clinical settings. In contrast, domain-specific models are specifically pre-trained on curated biomedical text, peer-reviewed literature, and structured health data and may potentially reduce hallucination rates, increase the precision of medical terminology, and align more closely with established standards of care. Our study is one example that shows how general-purpose models compared to specialized models tailored to clinical contexts may likely underperform in healthcare-related tasks. However, this is not always the case, as demonstrated by another study that showed similar performances between domain-specific and general-purpose language models for identifying the need for preoperative cardiac evaluations [3]. Furthermore, general-purpose language models may be further fine-tuned with clinical notes or with optimized prompt engineering to improve performance for healthcare-related tasks [4]. Regardless, the concept is the same, in that leveraging LLMs for clinical tasks must take into consideration the knowledge base of its underlying foundation model for developing accurate AI-based tools for medicine. We agree with the authors' point that we need to ensure international relevance when using LLMs as clinical tools. Even within healthcare itself, AI models trained on certain subpopulations may still not be accurate and exhibit bias when used on another patient population [5]. It follows that a model that performs well within one country, trained on one patient population, may not generalize globally, particularly when guidelines, documentation styles, and patient demographics vary. Just as we validate clinical guidelines across populations, so too must we evaluate LLMs to ensure safe, equitable application [6]. In order to properly use models, clinicians need to understand not only what the result is but why. In order to properly develop and use clinical tools, we must understand them—the ethical principle behind explainability. Moving forward, it is important if we use AI models that we maintain transparency, as Ucdal et al. point out, with interpretability mechanisms to understand what models learn and why. We will hold LLMs and other AI tools to the same standard as all clinical tools used in medicine. As with any medical advancement, those developing and implementing the tool have responsibility for clinical validation, usability testing, post-deployment monitoring, and ongoing iteration based on real-world data. As healthcare moves toward an increased demand and utilization, AI technologies such as LLMs have come into play to streamline and improve care in a world of increasing workload and decreasing resources. As we have seen in various aspects of healthcare, including our study using LLMs to identify a difficult-to-quantify state such as frailty, LLMs and other aspects of AI increasingly show great potential in improving our ability to care for patients. Like with any clinical tool we use, it must be proven to improve and not compromise care. Along those lines, it is also critical to apply tools that are relevant and designed to perform well. Careful steps forward to make sure only AI technologies appropriate to the proposed usage, such as domain-specific LLMs, careful testing, validation, and transparency of models will ensure we are improving care and not causing harm to our patients. In this way, clinicians can learn about and lead healthcare toward the best direction forward using a complex but powerful technology. Y.Q.Z. contributed to the concept design and preparation of the manuscript. R.A.G. contributed to the concept design and preparation of the manuscript. The authors have nothing to report. The authors declare no conflicts of interest. This publication is linked to a related Letter to the Editor article by Üçdal et al. To view this article, visit https://doi.org/10.1111/jgs.70171.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.391 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.257 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.685 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.501 Zit.