Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Domain‐Specific <scp>LLMs</scp> in Clinical Medicine: Identifying Preoperative Frailty From Clinical Notes
0
Zitationen
4
Autoren
2025
Jahr
Abstract
The rapid evolution of artificial intelligence in healthcare has reached a critical juncture where we must address a fundamental question: should we rely on general-purpose large language models adapted for medical tasks, or invest in developing specialized models designed specifically for clinical applications? Domain-specific large language models (LLMs) are models trained or fine-tuned on clinical corpora (such as PubMed abstracts or electronic health records) to enhance performance in specialized medical tasks. Recent publications in leading medical journals provide compelling evidence that domain-specific models represent not merely an incremental improvement, but a necessary evolution in medical AI [1, 2]. The study by Zhou et al. in the Journal of the American Geriatrics Society exemplifies this paradigm shift [2]. Their work on automated frailty identification from clinical notes revealed striking performance disparities between general and specialized models. While general BERT achieved an area under the curve of only 0.64, medical-specific models like PubMedBERT and BioClinicalBERT achieved 0.87, demonstrating superior capability in identifying complex clinical conditions [2]. This performance gap is not merely statistical; it represents the difference between clinically useful tools and inadequate systems that could compromise patient care. Similar findings have been reported in other specialized medical applications, including the LiVersa system for liver disease management [1]. The advantages of domain-specific medical large language models extend beyond improved accuracy metrics. These specialized systems possess an intrinsic understanding of medical terminology, clinical reasoning patterns, and the nuanced language healthcare professional's use in documentation [3, 4]. General-purpose models, despite their impressive capabilities in other domains, often struggle with medical abbreviations, contextual meanings of clinical terms, and the critical distinctions that can determine appropriate patient care [2]. For instance, distinguishing between “stable” angina and “unstable” angina requires not just linguistic comprehension but a deep understanding of the clinical implications that specialized models trained on medical literature inherently possess [5]. Furthermore, as Ucdal et al. correctly noted in their critique of LiVersa, medical large language models must incorporate diverse international clinical guidelines to ensure equitable care across global populations [3]. A model trained solely on American Association for the Study of Liver Diseases guidelines may provide recommendations that conflict with European or Asian standards of care [3]. This internationalization requirement demands purposeful development strategies that general-purpose models cannot adequately address through simple fine-tuning. The integration of multiple guideline sources represents a critical requirement for global healthcare applicability [1, 3]. The interpretability challenge presents another compelling argument for specialized development. Healthcare applications demand transparent, explainable artificial intelligence that clinicians can trust and validate [3, 6]. Domain-specific models can be architected from inception with interpretability mechanisms like LIME (Local Interpretable Model-agnostic Explanations) or SHAP (SHapley Additive exPlanations), providing clinicians with clear insights into the reasoning behind AI-generated recommendations [3]. This transparency is not merely desirable but essential for clinical adoption and regulatory compliance, as emphasized in recent medical AI governance frameworks [6, 7]. Looking forward, we must recognize that developing domain-specific medical large language models is not an academic luxury but a clinical imperative. The investment required for such specialized development is justified by the potential to improve patient outcomes, reduce medical errors, and enhance clinical efficiency [2, 8]. We call upon funding agencies, healthcare institutions, and technology companies to prioritize the development of medical-specific language models. These efforts should include rigorous clinical validation, incorporation of international guidelines, built-in interpretability features, and continuous evaluation against human expert performance [1-3]. Only through such dedicated development can we realize the full potential of artificial intelligence in transforming healthcare delivery while maintaining the highest standards of patient safety and clinical excellence. Mete Ucdal: conceptualization, manuscript drafting, critical revisions. Mihriban Gungor: data curation, literature review, manuscript drafting. Elif Gecegelen: literature review, manuscript editing. Mustafa Cankurtaran: supervision, critical revisions, final approval. The authors declare no conflicts of interest. This publication is linked to a related Reply by Zhou and Gabriel. To view this article, visit https://doi.org/10.1111/jgs.70177.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.391 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.257 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.685 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.501 Zit.