Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial intelligence in medicine: A primer and recommendation
6
Zitationen
3
Autoren
2024
Jahr
Abstract
Artificial intelligence (AI) is a field of computer science that focuses on creating intelligent systems, which can reason, learn, and act autonomously, exhibiting “human-like” thinking and behavior. Machine learning (ML), deep learning (DL), and large language models (LLMs) are all subsets of the rapidly evolving field of AI, and are poised to change healthcare.1, 2 In this paper, we explain these terms and provide examples where such approaches are already being applied for improving prediction, precision, or quality in healthcare delivery. We then make the case for integrating AI training across the medical curricula, with specialty (fellowship) training available to clinicians, to avoid the pitfalls of a previous technological innovation, the electronic medical record.3 ML4 is a subfield of AI (Figure 1). Traditional computer programming entails giving very specific instructions to a computer to perform stipulated tasks. The field of ML seeks to make “machines do what we (as thinking entities) can do.”5 In other words, ML entails training models on vast amounts of data, such that the computer recognizes patterns within the data to describe the data, or make predictions based on it, as we have seen with applications in medical imaging for over a decade.6 ML broadly entails three distinct approaches: supervised, unsupervised, and reinforcement learning. Supervised models are trained on data where outputs are preidentified and labeled. For example, a particular cluster of pixels on an X-ray has been preidentified by a subject matter expert as pneumonia “labeled outcome”; or certain permutations of lab values and vitals have reached a threshold for intensive care unit (ICU) admission “labeled outcome”; allowing the “machine” to recognize from input data and then diagnose a pneumonia or recommend an ICU admission. Typically, these algorithms are validated on data sets distinct from the training data set, and then applied to entirely new data sets to describe, predict, or prescribe outcomes. Unsupervised learning allows the machine to look for patterns or associations that were not prelabeled or identified. Unsupervised ML algorithms can reveal new relationships in the data sets, presented as clusters (e.g., patients in a certain ICU seem to be admitted either for short stays or for very long stays), associations (patients from a certain zip code were overrepresented in emergency department visits for diarrhea); or dimensionality reduction (of all the continuous vitals collected on telemetry, beat-to-beat variability seemed to be the most important predictor of readmission rate). In a recent study using unsupervised learning for automated detection of coronary artery disease subgroups, cluster membership (youngest/multiethnic or middle age/lowest medication adherence, etc.) was more informative than standard risk calculators for assessment of myocardial infarction, stroke, and mortality.7 Reinforcement learning replicates basic human behavior by “rewarding” the algorithm for a specific choice it makes every time it is presented with a decision, so the machine learns to prioritize the rewarded outcome. The machine learns over time, for example, to recognize all the various decisions made on the floor or in the ICU that reduced readmission rates, eventually nudging clinicians toward the prioritized decision when presented with a choice. DL is the subset of AI that, like ML, drives everyday applications we are familiar with, including facial recognition, voice assistants, and fraud detection applications. DL is modeled on the neuronal circuit of the human brain—it comprises, just like our brains, “neural networks” where thousands or millions of software modules or “nodes” are interconnected and organized into layers, and each node processes data and generates output, not dissimilar to our own neurons. Input layer nodes pass data on to “hidden layers” that adapt their behavior as they receive new information, capable of “learning” as we have discussed in the preceding sections, and pass on data to the output layer of nodes, capable of generating a range of outputs. This general process is called “propagation.” Like humans, DL models can learn from their “experience.” Through a process of “backpropagation”, the model can calculate errors in its own predictions and then adjust the weights and biases of its internal functions by moving backward through the layers to improve the accuracy of its classification outputs. DL is resource intensive, requiring vast data (and hence storage) and computing power. DL approaches are not dependent on human inputs, as are ML models, and can therefore scale to very large data sets and complex problems, enabling it to tackle challenges that were previously intractable. DL has already found applications in, and is rapidly changing, radiology, dermatology, pathology, and genomics, achieving physician-level accuracy in certain diagnostic tasks.2, 8, 9 Generative AI is a subset of DL that focuses on creating new content, such as text, images, speech, or other media, based on the patterns and characteristics learned from the training data. In the case of text generation, LLMs like OpenAI's GPT-4, where GPT stands for Generative Pretrained Transformer, and Google's Bard are prime examples of generative AI. These models are trained on vast amounts of text data, allowing them to understand and generate human-like language. The concept of “Transformers,” introduced in the seminal paper “Attention is all you need” by a team of Google scientists in 2017, has revolutionized the field of natural language processing. Transformers are a type of neural network architecture that uses self-attention mechanisms to weigh the importance of different words in a given context, enabling the model to understand how words and phrases relate to each other even when they are far apart in a sentence or paragraph. The development of Transformers has paved the way for the creation of LLMs like GPT-4, which is trained on an extensive data set including textbooks, articles, webpages, social media conversations, and news articles.10 These models have the ability to generate human-like text, engage in conversational interactions, and even assist with tasks such as writing, analysis, and question-answering. LLMs' conversational abilities are expected to augment patient and provider interactions with virtual clinical assistants, improve electronic medical record interfaces for clinicians and providers, and change interactive educational platforms to generate customized learning experiences for trainees.1, 11-13 In the case of image generation, generative models learn from large data sets of images to create novel images that share similar characteristics with the training data. These models consist of two main components: a generator and a discriminator. The generator learns to create images that resemble the training data, while the discriminator learns to distinguish between real images from the training set and generated images. Through an iterative process called adversarial training, the generator and discriminator compete against each other, resulting in the generator creating increasingly realistic images. As the reader can appreciate, there is significant statistical uncertainty and risk of bias associated with the applications described above.14 After all, these models are only as good as the data used to train them. While DL may allow for course correction, it may also risk reinforcing incorrect patterns that the machine has learned, similar to how humans can develop biases based on limited or skewed information. However, these limitations should not preclude their use. Clinicians should instead gain a deeper understanding of what these algorithms can and cannot do and know what to demand from administrators and startups that recommend new automated decision tools. In the next section, we describe the key limitations of AI in healthcare and how to address them. IBM's Watson was one of the earliest widely known computer systems to leverage natural language processing and ML to recommend clinical treatment plans in oncology. By 2017, Watson had been trained on a vast amount of medical literature and data, including PubMed, the National Cancer Institute's Drug Dictionary, and the Sanger Institute's Catalogue of Somatic Mutations in Cancer database.15 However, despite the $62 million investment by MD Anderson Cancer Center, the Watson Oncology Expert Advisor failed to meet expectations and was ultimately abandoned.15 Billions more have since been poured into the AI in healthcare with poor returns on investment, yet.16 The success of LLMs has reignited an interest in AI applications in healthcare. AI applications could produce erroneous or biased results for a variety of reasons. The quality and representativeness of the input data are a primary concern, as health data embed within them the biases and limits of the healthcare system from which they originate. Underrepresented groups, for example, whether by race, gender, age, ethnicity, or immigration status, may be under- or overrepresented in data from certain zip codes, health systems, or insurance plans.17 Algorithms trained on such data will reflect societal prejudices and disparities present in the data.17 These limitations can be particularly ominous if insurance coverage and treatment plans are determined by algorithms whose embedded biases are unknown and decision rules opaque. There is a growing consensus that “blackbox” algorithms should not be allowed in medical decision making where data and code cannot be accessed for validation. Scientists and ethicists are calling for AI algorithms to be open source, so they can be examined, and modified, to address limitations as they emerge. Technological advances like differential privacy—a tool that introduces statistical noise into datasets, preserving privacy without compromising utility—can also make the data on which AI tools are trained more available for scrutiny. As with any new medical intervention, digital tools must also be required to state the potential (however inadvertent) harm that may arise from their use. LLMs, like physicians, do not like uncertainty. LLMs are, after all, prediction tools—they predict the most plausible outcome of texts, paragraphs, or documents in response to the queries posed to them, and sometimes present inaccurate information due to the inherent uncertainty in the task and the limitations of their training data. While a faltering human voice may reveal uncertainty or hesitation, it is currently hard to detect these incorrect responses, called “hallucinations,” from an AI unless preposterous. In our own experience, we have found citations of papers that LLMs attribute to us, but which we have not actually authored. Given the prevalence of hallucination across all AI applications, we expect that technological guardrails will emerge to address this issue.18 Electronic medical records—with staggering price tags—evolved primarily to address the complexity of insurance reimbursements, and to facilitate medicolegally sound documentation through mandated clickthrough and templates. Occasional nudges and reminders improved patient quality and safety.19 However, electronic medical records have been identified as one of the chief contributors to physician burnout.3, 19 Sound design principles dictate that local workflows, clinician preferences, interoperability, and accessibility should be the foundational elements of electronic medical records. However, these crucial factors were not given the priority they deserved during the development and implementation of these systems. AI tools risk compounding the missteps of electronic medical records, unless clinical and research training keeps pace with technological advancements. Without identifying key bedside, research, and population health priorities that could benefit from AI tools, we risk suboptimal returns on investment. We recommend the following: (1) Integrating AI education in premedical and medical training to ensure that future clinicians are well-versed in the basics of AI and its potential applications in healthcare. For clinicians already in practice, provide continuing medical education opportunities focused on AI to help them stay current with the latest developments and understand how AI can be leveraged to improve patient care and outcomes. Furthermore, offer specialization options through residencies and fellowship training—as has been done with healthcare administration and management or a certification pathway similar to the point-of-care ultrasound (POCUS) pathway in hospital medicine to help create a generation of healthcare AI “bilinguals” who can bridge the gap between data science, computer science, and medicine. These specialists will be crucial in driving the responsible development and implementation of AI tools in healthcare. (2) Develop strategic and proactive “deep” interdisciplinary teams at healthcare institutions and in our universities. Complex problems are best solved with interdisciplinary approaches and multidisciplinary teams. These teams should include domain experts, data scientists, ethicists, and patient advocates to ensure that AI tools are developed and deployed responsibly and effectively. By bringing together professionals with diverse expertise, we can better identify and prioritize the specific needs of healthcare institutions and ensure that AI solutions are designed to address these needs. This collaborative approach will help align AI development with institutional goals, leading to more targeted and impactful innovations that improve patient care, streamline workflows, and optimize resource allocation. Furthermore, the inclusion of ethicists and patient advocates in these teams will help to ensure that AI tools are developed and deployed in a manner that is transparent, accountable, and respectful of patient privacy and autonomy. AI, including DL, LLMs, and computer vision applications, presents an opportunity to develop a whole new generation of tools that may make discovery, clinical operations, and virtual assistance more feasible, accessible, and meaningful. Hospital medicine, being at the forefront of patient care, has a unique opportunity to steer the development of this new generation of AI tools in the service of our patients. By understanding the basics of AI, recognizing its potential and limitations, and working collaboratively across disciplines, we can harness the power of AI to improve patient outcomes and support clinicians in their work. Dr. Shitij Arora has received grant support from Amazon Web Services Health Equity Initiative. He is a member of the scientific advisory board at Healthipeople. Sunit P. Jariwala has received grant support from the NIH, AHRQ, Stony Wold-Herbert Fund, PCORI, American Lung Association, Price Family Fund, Genentech, AstraZeneca, Sonde Health, Aevice Health, and Einstein CTSA/National Center for Advancing Translational Sciences; and has served as a consultant and/or member of a scientific advisory board for Teva and Sanofi. The remaining author declares no conflict of interest.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.393 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.259 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.688 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.502 Zit.