Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Using generative artificial intelligence in clinical practice: a narrative review and proposed agenda for implementation
2
Zitationen
5
Autoren
2025
Jahr
Abstract
Generative artificial intelligence (GenAI) is any computer system capable of generating text, images, or other types of content, often in response to a prompt or question entered through a chat interface. GenAI comprises large language models (LLMs) and other general-purpose foundation models powered mostly by generative pre-trained transformer (GPT) deep learning technology. Compared with traditional AI models using single data modalities for specific classification or prediction tasks, GenAI comprises task-agnostic, increasingly multimodal models that learn shared representations of different data types and, using suitable prompts, may perform never-before-seen tasks.1 GenAI tools (also termed solutions or applications) are compelling because, unlike traditional AI, they are conversant, interacting directly with humans and generating human-like responses to prompts. These tools, in the form of ChatGPT and other GenAI chatbots, have very quickly captured the interest of researchers, clinicians and industry. Anecdotally, certain GenAI tools, such as ambient AI scribes and assistants, are already being used in many practice areas.2, 3 In the UK, one in five general practitioners now routinely use GenAI for various tasks.4 At the time of submission, this rapid uptake was occurring with little guidance on what use cases (tasks or clinical indications) are most amenable to GenAI, how GenAI tools intended for clinical practice should be used, evaluated and governed, and how to safeguard reliability, safety, privacy, and consent. In addressing these issues, we undertook a narrative review of existing literature, and using this evidence, we propose a phased, risk-tiered approach to implementing GenAI tools, discuss risks and mitigations, and consider factors likely to influence adoption of GenAI by both clinicians and health services. Although GenAI encompasses both text and image generation, this review primarily focuses on text-based applications in clinical practice, with image-related applications limited to report generation rather than image generation. Box 1 contains a glossary of terms used when describing GenAI. We searched PubMed and Google Scholar for articles published between 1 January 2022 and 31 August 2024 using search terms “generative AI”, “large language models”, “clinical practice” or “health care”. We focused on review articles and grouped them into key application domains to inform our implementation framework: clinical documentation (16), operational efficiency (20), patient safety (11), clinical decision making (42), and patient self-care (4). Seven reviews covering all these domains were also retrieved.5-11 From these reviews, we extracted references outlining the problem(s) being addressed and exemplars of implemented GenAI tools used to solve them. We noted considerable heterogeneity in study design and methodological rigour and relative paucity of real-world implementations across several domains. Despite the aforementioned limitations in current evidence, our review suggests that GenAI tools could be implemented over five phases (Box 2). These are sequenced according to increasing levels of patient risk, task complexity, and implementation effort, and decreasing levels of current technical maturity and evidence of safety and effectiveness. The phased approach affords careful introduction of GenAI, beginning with tools that primarily enhance administrative efficiency (lower patient risk), progressing to those directly influencing clinical decisions and patient self-care (higher patient risk and requiring regulatory approval). EMRs = electronic medical records. Automating clinical documentation: Doctors in clinics can spend up to 2 hours on documentation for each hour of direct clinician–patient interaction;12 hospital residents and nurses spend up to 25%13 and 60%14 respectively of shift time on documentation. Ambient GenAI tools capable of voice transcription and note generation during doctor–patient encounters can decrease documentation time up to 25% (“keyboard liberation”),15-17 and allow more attentiveness to patients. Similarly, scribes in nurse–patient encounters can double the time the nurse spends on direct patient care.18 Ambient GenAI tools can also generate a readily understood patient summary,19 potentially increasing satisfaction and adherence with care. Ambient scribes could also include real-time advice, such as highlighting missed items in history or overlooked investigation results.20 Synthesising patient information from medical records: When interviewing new patients in the clinic or on ward rounds, clinicians can spend up to a third of the encounter retrieving, reading and synthesising patient summaries from electronic medical records (EMRs) before patient contact.21 GenAI can generate easily interpreted summaries of pertinent history, investigation results and treatments more accurately than clinicians22 while reducing this familiarisation time by around 20%.23 Generating discharge summaries from EMRs: Writing discharge summaries is time consuming, error prone and often slow in reaching recipients,24 with suboptimal patient outcomes.25 GenAI can generate summaries more accurate than those of junior doctors in 90% of cases,26 are available at discharge,27 and lessen the time seniors spend in supervision for complex cases by a third.28 Optimising consent: The reading grades (school-grade level of reading skill required for understanding) of most consent forms exceed the population average (8th grade) and often lack procedure-specific information required for informed consent. Clinicians can use GenAI chatbots that, by inputting clinician-verified text, could provide more comprehensible, informative and empathetic versions that take less time to read.29 Automating routine administrative tasks: Scheduling clinic appointments, organising staff rosters, drafting minutes and policy documents, and coding patient records are all labour-intensive but potentially automatable tasks. As examples, GenAI could more quickly create safer and fairer rosters,30 expedite coding for faster remuneration,31 and improve operational decision making.32 Improving hospital capacity management: Overcrowded emergency departments, access block to inpatient beds, delayed discharges and avoidable readmissions are commonplace. GenAI-enabled patient triage and discharge planning33, 34 and command and control patient flow systems could assist clinicians and bed managers in optimising bed use.35 Improving workflows in image-based disciplines: Taking radiology as the most mature domain, heavy workloads stress radiologists and cause delays in issuing reports, which can compromise patient care.36 Tools that automate image interpretation and structured reporting37 can reduce total reporting time by a third,38 reducing radiologist burnout and shortening report turnaround times.39 GenAI could potentially optimise referral and reporting prioritisation, patient scheduling and preparedness, and scan protocoling.40 Similar benefits in prioritising, interpreting and reporting digital pathology slides could also be realised with GenAI.41 Facilitating gathering and trending of data: Care-related near misses or adverse events such as medication harm and delirium are currently ascertained retrospectively from medical records or incident reports, with significant lag times. Such data could be captured, quantified and trended in real-time using LLMs applied to EMRs, thus facilitating more timely recognition of unsafe situations warranting remedial intervention.42-44 Expediting analysis of data: Considerable time and effort are spent in gathering, analysing and reporting quality and safety measures, incident data, and undertaking root cause analyses, with often little impact on care.45, 46 GenAI could aggregate and analyse these data more efficiently,47 identify safety hazards and contributors more quickly, automate audit48 and survey analyses,49 and allow quality and safety staff to redirect resources to proactive safety improvement.50 Retrieving medical evidence to inform decision making: Current online literature search systems (eg, PubMed) take time to search and synthesise data, are limited to simple keyword queries, and often retrieve limited relevant, actionable reports.51 GenAI, particularly using retrieval augmented generation, can very quickly and iteratively, in response to serial prompts, screen available literature and synthesise high quality, actionable evidence with supporting references,52 although the ability of LLMs to assess risk of bias of clinical trials remains limited.53 Reducing diagnostic error: Diagnostic error accounts for 60–70% of all medical errors causing harm, mostly caused by cognitive biases in reasoning.54 Responding to clinician prompts, GenAI could suggest more accurate differential diagnoses or detect and reduce misdiagnosis,55 particularly for complex, undifferentiated general medical cases involving non-expert clinicians.56 Personalising therapies: The response of many patients to specific therapies for diagnosed and confirmed diseases remains unpredictable.57 Applying GenAI to EMRs and genomic databases could identify patient genotypes or phenotypes associated with favourable or unfavourable treatment responses, as seen in various oncological applications.58 More rigorous evaluation will be required of consumer-facing applications relying on the do-it-yourself proficiency of users who may lack medical expertise, especially as GenAI chatbots could give seemingly confident, personalised but inappropriate advice.59 Providing medical advice: GenAI symptom checkers can diagnose conditions better than laypeople using traditional online information sources, but remain inferior to vetting by clinicians, with triage decisions for acute conditions particularly problematic.60 However, GenAI chatbots fine-tuned on curated medical knowledge could reliably identify patients’ needs and provide informed suggestions.61 Chatbots that can process and draft responses to messages and queries of patients with diagnosed conditions under the care of clinicians can also alleviate clinician burden and enhance patient engagement.62 Improving chronic disease self-management: The use of GenAI chatbots to manage chronic diseases seems well accepted by patients in supporting mental health, physical activity and behaviour change for selected conditions,63 but evidence of effects on patient outcomes is limited.64 Wearable devices integrated with GenAI can potentially detect adverse health states such as falls or clinical deterioration.65 Several risks to patient safety and quality of care require careful consideration.66-73 These relate to: reliability (errors, hallucinations); consistency (different responses to the same question); explainability (few rationales for responses); limited understanding of context; biased responses due to unrepresentative training data; misuse of prompts; potential privacy breaches; little auditability of tool processes and outputs; workflow disruptions and job displacement; depersonalised care; over-reliance of clinicians on GenAI with clinician de-skilling; limited clinician and patient acceptance; and costs and carbon footprint. However, risk mitigation strategies exist and will continue to evolve (Box 3). Although many of these risks are common to all forms of AI, certain risks, such as hallucinations, prompt misuse and the inability to be audited, are peculiar to GenAI. GenAI is also not yet capable of higher-order reasoning, contextual understanding, capturing sensory and nonverbal cues, or making moral or ethical judgements. Decision support LLMs may produce inconsistent advice to the same queries and be as prone to cognitive biases as humans.74 GenAI alters its behaviour in response to new data inputs or updating or recalibration of its operations, which may go unannounced. Importantly, in performing several different tasks, acceptable GenAI performance on one “benchmark” task does not translate to other, seemingly related tasks for which it was not trained.75 This challenges the generalisability of any single, point in time evaluation of an evolving model with a large potential task capability. Ensuring the quality of massive datasets used to train GenAI models is challenging compared with traditional AI models trained on smaller, targeted datasets. The behaviour of hugely complex LLMs with billions of parameters performing different tasks cannot be understood, despite knowing their technical architecture. Evaluation and regulation of GenAI tools with their limitless and changing arrays of inputs and outputs is hugely challenging. A single, fit-for-purpose pre-deployment assessment and approval of all GenAI tools, as software as a medical device (SaMD), may not suffice for tools that continue to learn and adapt. Currently the Therapeutic Goods Administration (TGA) regulates some but not all AI tools designed to support clinical decision making as SaMD, but exempts tools, such as GenAI scribes, which provide only documentation or administrative assistance. The TGA’s remit for consumer-facing AI tools remains undefined. Current regulatory and accreditation processes,76 coupled with amendments in society-wide laws (eg, privacy, consumer and anti-discrimination laws) may be sufficient to cover many GenAI applications. Two regulatory approaches are possible: an application-centric approach, and a system-centric approach. In an application-centric approach, individual tools are evaluated according to task criticality and patient risk. For high risk diagnostic or treatment applications (phases 4 and 5), the tool may be frozen pre-deployment and evaluated in a standard pathway (versus a fast pathway) using pragmatic clinical trials (Box 4).77-79 If approved, the tool could later be locked down, re-opened, retrained (if needed), and re-evaluated for re-approval if any substantive change in function or deviation from benchmark tasks is seen. The US Food and Drug Administration calls for AI developers to provide an algorithm change protocol describing how modifications are generated and validated.80 Lower risk tools (phases 1 and 2) may pass through a fast pathway, requiring only observational studies or post-deployment verification studies for approval. A standardised, actionable, risk-based checklist for evaluating GenAI along multiple axes, including post-deployment monitoring of real-world performance and clinical impact, is needed81-83 as are similar checklists for identifying and resolving ethical concerns.84, 85 Importantly, any GenAI tool must undergo a standardised clinical validation process at the local level using local data, including tools with regulatory approval. Using open-source or open-weight tools hosted on local servers may be the best option for protecting privacy, but requires in-house data scientists and technical staff for model training and tool deployment. Clinical validation stage (by risk level): For lower risk applications (phases 1–3): For higher risk applications (phases 4 and 5): A complementary system-centric approach requires tool developers and deployers (ie, large-scale health services) to wrap a quality assurance framework86 around their GenAI activities, comprising both risk mitigation (Box 3) and life cycle monitoring and evaluation. This framework may include statistical process control analyses that define acceptable bounds around tool accuracy or analyses of downstream effects on proximal clinical outcomes (eg, adverse events, mortality).87 More proxy measures of tool use, such as tracking the number of human-initiated corrections to LLM-created documents, could also be used.88 Developers and deployers might be accredited by an appointed authority to use GenAI tools depending on how well they measure, report and satisfy these parameters. Health services may need to establish dedicated, multidisciplinary clinical AI units to perform these tasks and provide the necessary human expertise and digital infrastructure.89 Such units may also specialise in validating and piloting specific applications before deployment in other similar or affiliated services, given the limited capacity of some services to undertake these tasks for every GenAI tool they may want to deploy.90 A balance is therefore needed between bespoke and more centralised evaluations, with the latter preferred for widely used, high value, high risk or high impact solutions. Because of its human-like interactivity, GenAI is rapidly gaining acceptance by frontline clinicians for certain tasks (eg, ambient scribes), bringing a cultural shift in how medicine is practised and providing more value over time.91 Clinicians will likely adopt GenAI for common tasks where it has demonstrated acceptable accuracy and safety, is easy to use, aligns with clinical workflows, and enhances clinician–patient interactions.92 Clinician trust will rely on clearly articulated use cases, well defined risk-based clinical testing processes and evidence generation, and ongoing monitoring of performance linked to original indications.93 Consumer trust will centre on tool accuracy, transparency around GenAI use in their care, and privacy assurances.94 Meaningful co-design with diverse consumer groups can help identify concerns and build appropriate safeguards into GenAI implementation.95 users of GenAI, both health and will need to be well through and training in its how to use it human to its and undertake appropriate consent in using AI in care GenAI tool should with a or model providing as on its function and training performance bias safety technical prompt and conditions of appropriate narrative review several or system factors likely to influence GenAI with some common to all forms of is the need for involving researchers, data clinicians and in and GenAI tools and they are for health services must to adopt GenAI open-source or tools, with local as or or with a tools to tools with EMRs through application or them EMRs is for to that health services to assess the of GenAI tools before to deployment is The and of and also need Lower may be using integrated and units to data from EMRs and using it to train and their or tools, if are to all services for local it is for and health services to enhance access for developers to patient data from EMRs and other for GenAI while data Health data are currently often lack data and access is on multiple data using different access access processes and data using common data are GenAI into clinical practice should be by implementation that optimise and for a patient their acceptance or of GenAI advice must be defined and using strategies from and in implementing health care AI (Box This narrative review has potential from GenAI in clinical our evidence may be for reviews published over the which LLMs such as ChatGPT were and our of use cases is not intended to be As GenAI is rapidly evolving and is a time lag between of original and into review we that some articles may have We the limitations of current and the need for more real-world to the evidence on the quality and safety of care, and identify the tasks and for which this new and rapidly evolving is best The of GenAI is its across multiple tasks and its conversant, language rather than performance on every tasks will be better by fine-tuned learning models rather than and technical will be required coupled with rigorous evaluation that allow users to quickly to and We propose a phased, risk-tiered implementation of GenAI tools into health care coupled with risk mitigation As a human GenAI will be but and introduction may improve current access by The of as of the The of the of data original original review and Writing review and Writing review and Writing review and
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.239 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.095 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.463 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.428 Zit.