OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 05.04.2026, 10:16

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Domain‐Specific <scp>LLMs</scp> in Clinical Medicine: Identifying Preoperative Frailty From Clinical Notes

2025·0 Zitationen·Journal of the American Geriatrics SocietyOpen Access
Volltext beim Verlag öffnen

0

Zitationen

4

Autoren

2025

Jahr

Abstract

The rapid evolution of artificial intelligence in healthcare has reached a critical juncture where we must address a fundamental question: should we rely on general-purpose large language models adapted for medical tasks, or invest in developing specialized models designed specifically for clinical applications? Domain-specific large language models (LLMs) are models trained or fine-tuned on clinical corpora (such as PubMed abstracts or electronic health records) to enhance performance in specialized medical tasks. Recent publications in leading medical journals provide compelling evidence that domain-specific models represent not merely an incremental improvement, but a necessary evolution in medical AI [1, 2]. The study by Zhou et al. in the Journal of the American Geriatrics Society exemplifies this paradigm shift [2]. Their work on automated frailty identification from clinical notes revealed striking performance disparities between general and specialized models. While general BERT achieved an area under the curve of only 0.64, medical-specific models like PubMedBERT and BioClinicalBERT achieved 0.87, demonstrating superior capability in identifying complex clinical conditions [2]. This performance gap is not merely statistical; it represents the difference between clinically useful tools and inadequate systems that could compromise patient care. Similar findings have been reported in other specialized medical applications, including the LiVersa system for liver disease management [1]. The advantages of domain-specific medical large language models extend beyond improved accuracy metrics. These specialized systems possess an intrinsic understanding of medical terminology, clinical reasoning patterns, and the nuanced language healthcare professional's use in documentation [3, 4]. General-purpose models, despite their impressive capabilities in other domains, often struggle with medical abbreviations, contextual meanings of clinical terms, and the critical distinctions that can determine appropriate patient care [2]. For instance, distinguishing between “stable” angina and “unstable” angina requires not just linguistic comprehension but a deep understanding of the clinical implications that specialized models trained on medical literature inherently possess [5]. Furthermore, as Ucdal et al. correctly noted in their critique of LiVersa, medical large language models must incorporate diverse international clinical guidelines to ensure equitable care across global populations [3]. A model trained solely on American Association for the Study of Liver Diseases guidelines may provide recommendations that conflict with European or Asian standards of care [3]. This internationalization requirement demands purposeful development strategies that general-purpose models cannot adequately address through simple fine-tuning. The integration of multiple guideline sources represents a critical requirement for global healthcare applicability [1, 3]. The interpretability challenge presents another compelling argument for specialized development. Healthcare applications demand transparent, explainable artificial intelligence that clinicians can trust and validate [3, 6]. Domain-specific models can be architected from inception with interpretability mechanisms like LIME (Local Interpretable Model-agnostic Explanations) or SHAP (SHapley Additive exPlanations), providing clinicians with clear insights into the reasoning behind AI-generated recommendations [3]. This transparency is not merely desirable but essential for clinical adoption and regulatory compliance, as emphasized in recent medical AI governance frameworks [6, 7]. Looking forward, we must recognize that developing domain-specific medical large language models is not an academic luxury but a clinical imperative. The investment required for such specialized development is justified by the potential to improve patient outcomes, reduce medical errors, and enhance clinical efficiency [2, 8]. We call upon funding agencies, healthcare institutions, and technology companies to prioritize the development of medical-specific language models. These efforts should include rigorous clinical validation, incorporation of international guidelines, built-in interpretability features, and continuous evaluation against human expert performance [1-3]. Only through such dedicated development can we realize the full potential of artificial intelligence in transforming healthcare delivery while maintaining the highest standards of patient safety and clinical excellence. Mete Ucdal: conceptualization, manuscript drafting, critical revisions. Mihriban Gungor: data curation, literature review, manuscript drafting. Elif Gecegelen: literature review, manuscript editing. Mustafa Cankurtaran: supervision, critical revisions, final approval. The authors declare no conflicts of interest. This publication is linked to a related Reply by Zhou and Gabriel. To view this article, visit https://doi.org/10.1111/jgs.70177.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareBiomedical Text Mining and Ontologies
Volltext beim Verlag öffnen