OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 28.03.2026, 04:06

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Agentic Large Language Models for Healthcare: Current Progress and Future Opportunities

2025·13 Zitationen·Medicine AdvancesOpen Access
Volltext beim Verlag öffnen

13

Zitationen

1

Autoren

2025

Jahr

Abstract

Agentic large language models (LLMs) combine vanilla LLMs’ reasoning, planning, and reflecting acquired from vast pre-trained datasets with specialized tools to enhance problem-solving capabilities. To understand the current progress and future opportunities of agentic LLMs for healthcare, this commentary conducted a comprehensive literature search in PubMed and summarized key takeaways. Large language models (LLMs) have revolutionized the healthcare industry by advancing artificial intelligence (AI) to expert-level capabilities in understanding, reasoning, and generating human language [1, 2]. However, standard LLMs occasionally make basic errors, such as miscalculations in simple arithmetic problems [3], prompting the development of agentic LLMs (ALLMs) to address these limitations. ALLMs combine vanilla LLMs' robust reasoning, planning, and reflecting capabilities acquired from vast pre-trained datasets, supervised instructions, and reinforcement learning from human feedback (RLHF), with specialized tools such as calculators to enhance task-specific problem-solving [4]. The advanced reasoning, planning, and reflecting abilities of ALLMs enable them to independently execute complex, multistep clinical tasks, and surpass the performance of conventional LLMs, which rely passively on clinician-provided prompts to initiate and sustain the generation process. Specialized analytical tools integrated into ALLMs streamline intermediate analytical processes and complement information from other modalities, such as medical images and clinical signals [5], which presents challenges that traditional LLMs, limited by their pre-training on internet-derived text corpora, struggle to process [6]. Additionally, ALLMs incorporate short-term memory within individual conversation sessions, thereby retaining interaction information for iterative reasoning and planning, and long-term memory across multiple sessions, thereby enabling decision-making recall in similar historic tasks. Augmenting ALLMs with both short-term and long-term memory significantly reduces the likelihood of oversights or errors [7], which is a capability that standard LLMs can only achieve through intricate prompt tuning and complex engineering interventions. Moreover, ALLMs integrate external knowledge to mitigate challenges such as hallucinations in vanilla LLMs, thereby enabling the generation and verification of evidence-based clinical recommendations by drawing on diverse medical knowledge sources [8]. Standard LLMs can acquire clinical knowledge through fine-tuning; however, the high computational costs associated with retraining hinder their ability to keep pace with rapidly evolving medical knowledge [9]. By contrast, ALLMs can access research findings, clinical case reports, and updated guidelines in a cost-effective manner without additional training. These augmentations enable ALLMs to tackle complex tasks that necessitate iterative reasoning, planning, and action, which align them, compared with standard LLMs, more closely with the golden clinical procedures of gathering cues, generating and interpreting hypotheses, refining them, and repeating the process as needed [10]. Figure 1 depicts the general architecture of ALLMs, which integrate LLMs as their central intelligence unit, leverage tools to enhance problem-solving capabilities, use memory to support advanced planning and reflection, and apply knowledge systems to gather contextual information. General architecture of agentic LLMs. LLM, large language model. To elucidate the current landscape of ALLMs-assisted healthcare, we performed a comprehensive literature search in PubMed, as shown in Figure 2, and summarized the key takeaways in the following paragraphs. A pragmatic ALLMs-based system was proposed by Zhou et al. [11] for multi-omics analyses. It requires users to input the data path, data description, and task objectives, and then autonomously calls built-in LLMs to generate analytical plans and corresponding code for downstream execution. To mitigate errors or refine results, it retains a memory of previous inferences, actions, and outcomes. When user demands are unmet or errors are detected, another round of agentic analysis is initiated, with relevant logs reprocessed by LLM-based intelligence engines. Apart from memory logs accumulated through the past behaviors of ALLMs, external knowledge is leveraged from behavior science to improve the empathy and actionability of fitness coaching-oriented chatbots. Similarly, KNOWNET [12] augments LLMs by integrating knowledge graphs to improve their accuracy and facilitate the structured exploration of non-pharmaceutical interventions for Alzheimer's disease. Clinical web pages are queried and autonomously converted into document objects, which enables the retrieval of real-time updated online knowledge [13]. Literature search pipeline in PubMed to identify relevant articles published from January 1, 2022 to October 19, 2024. We followed previous systematic reviews (1, 2) to design the groups of keywords: (1) “Large Language Model” “LLM” “Generative AI” “ChatGPT” “GPT” “Llama”; (2) “Health” “Medical” “Clinical” “Healthcare” “Medicine”; and (3) “Agent” “Agentic” “Multi-agent.” Although the single-agent LLMs discussed above have demonstrated remarkable capabilities, their limitations in handling highly complex or diverse healthcare tasks have prompted interest in more advanced approaches [14]. Consequently, researchers are shifting to multi-agent frameworks, wherein specialized agents collaborate to tackle distinct aspects of a problem, harnessing the collective intelligence of agents with unique capabilities [8]. A foundational two-agent system was exemplified by Alghamdi and Mostafa [15], consisting of one agent dedicated to generating medical guidance and a second agent responsible for validating the trustworthiness of generated responses. Similarly, a symptom-disease chatbot using a dual-agent system was developed by Ananta et al. [16]. The primary agent addresses user queries using a well-established knowledge graph. When the primary agent fails to retrieve relevant results, a secondary agent, fine-tuned on GPT-3, is activated to respond to user queries. A more complex system is established with five sequential agents responsible for medical guideline classification, question retrieval, matching evaluation, intelligent question answering, and results evaluation and source citation. In contrast to specific job assignments, agents in the study by Ghafarollahi and Buehler [17] are designated the general roles of user proxy, planner, assistant, critic, and group chat manager to collaboratively address sophisticated protein analysis and design. In addition to modules centered on upgrading model performance, recent studies suggest that visual or audio interactions in ALLMs can enhance user engagement. In Refs. [18, 19], the authors demonstrated the critical role of avatar-like visual embodiment in achieving therapeutic success with ALLM-based psychotherapy. For audio interactions, in Refs. [20, 21], the authors developed chatbots based on GPT-3.5 tailored to elderly individuals, and embedded speech recognition and text-to-speech tools to enable human-like conversations. My Care Questionnaire [22] and Convai [23] further explore both visual and audio interactivity to facilitate health data entry for individuals with sensory impairments and patient encounter simulations. In advanced studies in which avatars were provided with diverse exteriors [24], multiple avatars were designed with distinct personas tailored to accommodate users' preferences in a digital assistant aimed at promoting physical activity [25]. Lastly, although ALLMs show promise, their reliability and safety require thorough validation [26], particularly for specialized medical applications, because of misdiagnosis, poor management, and lack of clinical knowledge [1, 27]. Despite their integration with clinical knowledge-based guidelines, ALLMs still underperform human experts in detecting antimicrobial resistance mechanisms [28]. As illustrated in Figure 2, only 19 studies on ALLMs were identified from the 118 abstract-screened articles on LLM-assisted healthcare, which demonstrates limited attention accorded to the advanced technique within the broad community. This forms the core of our commentary to provide clinical experts with a foundational understanding of the latest research trends and foster their collaboration with AI researchers, and identify actionable scenarios of ALLMs that suit clinical needs [29] rather than imaginary applications driven by cutting-edge techniques or over-engineered frameworks. Additionally, deploying ALLMs-based tools in healthcare requires aligning model behavior with both user preferences and strict adherence to clinical guidelines, which calls for further investigation into whether general-purpose instruction tuning, RLHF, and LLMs' self-evolution inspired by DeepSeek can ensure compliance in real-world deployment. Moreover, based on clinician-conceptualized applications and AI researcher-developed methodology, dataset preparation requires the collaboration of AI researchers and clinicians to develop efficient annotation interfaces and organize panel discussions to design annotator reference materials, disagreement resolution approaches, and quality assessment protocols [30]. Han Yuan: conceptualization, data curation, formal analysis, investigation, methodology, visualization, writing–original draft, writing–review and editing. The author has nothing to report. This study is exempted from review by the ethics committee as it does not involve human participants, animal subjects, or the collection of sensitive data. The author has nothing to report. The author declares no conflicts of interest. The author has nothing to report.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationArtificial Intelligence in HealthcareMachine Learning in Healthcare
Volltext beim Verlag öffnen