Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial intelligence: international perspectives on critical issues
0
Zitationen
6
Autoren
2025
Jahr
Abstract
1. Introduction This document presents a concise overview of the collaborative Basic Science Focus Forum (BSFF) and International Trauma Care Forum (ITCF) symposium held during the 2023 Orthopaedic Trauma Association (OTA) meeting in Seattle. The symposium's focus was “Artificial Intelligence: International Perspectives on Critical Issues.” A survey conducted among the attendees indicated a unanimous consensus that the clinical adoption of artificial intelligence (AI) is inevitable within the next decade, with profound implications for clinical practice. Furthermore, a significant number of participants expressed concerns regarding the potential for AI to produce inaccurate predictions or unsound advice. They advocate for the OTA to take a central role in safeguarding orthopaedic trauma patients and surgeons against the inadvertent risks associated with AI. Discussions at the symposium spanned the current landscape of AI technology, its applications in preoperative and operative decision making, and the hurdles encountered in AI validation. 2. State of AI Technology AI is a field of computer science that seeks to “automate intellectual tasks normally performed by humans.” There are many approaches to AI. Expert systems proliferated in the 1980s and attempt to achieve AI through a hard-coded rules-based approach. However, most modern attempts at AI use machine learning (ML), in which machines are allowed to learn directly from data without hard-coded rules. In ML, data are fed into a mathematical model, which performs some computation and then outputs a prediction. Traditional ML uses “shallow” models that involve only a few steps of computation between inputs and outputs and thus have very strong assumptions about the relationships between inputs and outputs (eg, linear and logistic regressions assume that the relationship is linear). Even if there is a true association between the inputs and output of interest, none will be found if the type of association does not match the core assumption of the model. Therefore, a critical step of traditional ML is “feature extraction,” in which a human places the data into a form that is suitable for learning by the model. For example, if you were trying to predict knee function scores from x-rays with linear regression, you would first extract one or more human-coded features from the x-ray that you think may have a linear association with the function score, such as Kellgren-Lawrence grade. Deep learning is a ML subfield that uses very “deep” models known as “neural networks” as models. Neural networks are very complex mathematical functions with many intermediate steps of computation between inputs and outputs. This complexity allows these models to be very flexible and learn most of any association that may exist. Because of this, neural networks can operate directly on raw input data without feature extraction. For example, in the abovementioned example of predicting knee function scores from x-rays, you would simply input the pixels of the x-ray as your input features. Deep learning can currently be split into 2 basic paradigms (Fig. 1). In “Narrow AI,” a deep learning model is trained on thousands of labeled examples to make predictions of interest. For example, you may train a model to recognize hip fractures from hip x-rays. With enough data, it has been consistently demonstrated that these models can achieve expert performance at a given task. These models are very useful for diagnosis and triage of course but can also be used to perform a wide variety of important tasks, including task automation (eg, automated Cobb angle measurement and assessment of femoroacetabular impingement), cognitive assistance (eg, intraoperative anatomy recognition), and many more. These models can even be used to generate new knowledge. For example, researchers at Stanford discovered additional information about knee function in OA that must be present in the x-ray by simply training a model to predict KOOS scores from x-rays and showing that it explained much more of the variability than was possible just using the typical measures of OA severity. However, there are important limitations of Narrow AI, including that it is narrow in application (it has been trained to only perform 1 task and can only perform that one task); every new application requires lots of expensive, labeled data; and it often struggles to generalize to new data sets.Figure 1.: Artificial intelligence (AI) can be split into 2 basic paradigms: Narrow and Generative AI.The other predominant paradigm is “Generative AI.” This type of AI is enabled by large language models (LLMs), which are huge neural networks based on an architecture known as the transformer that are trained on huge corpuses of data to perform next sequence prediction. This is often text but can also be images (which are after all only sequences of pixels), videos, audios, or even combinations of different modalities. The large size of both these models and the data trained on has resulted in some surprising emergent features of these models, including the ability to perform well in new tasks without any significant model fine tuning. Thus, as opposed to Narrow AI, Generative AI models can be quite general purpose just by modifying the prompt submitted to the model. In addition, as mentioned above, they can be quite flexible in the type of data they work with. This has led to the development of LLMs that can perform a wide variety of tasks, such as text generation, summarization, and question answering. LLMs can also be used for medical applications, such as generating patient reports, triaging patients, and predicting outcomes. However, significant limitations of Generative AI also exist, including the problem of “hallucinations.” In addition, it can be more difficult to authoritatively benchmark the performance of a LLM on a given task (ie, know how well it is expected to perform). 3. Preoperative Decision Making—Distal Radius Fractures (RAIdius Project) To date, AI-driven computer vision research in the field of radiology has been focused on automated detection of pathoanatomy on plain radiographs, computed tomography (CT), and magnetic resonance imaging. For orthopaedic trauma, the number of deep learning models made available for fracture detection and classification is rapidly increasing, such as automated hip fracture detection as published in Lancet Digital Health.1 For distal radius fractures (DRFs), such AI algorithms have also been successfully trained and validated. To date, 15 studies have published deep learning algorithms to detect and/or classify DRFs on radiographs or CT, including some that are now commercially available. However, the prediction of clinical outcomes such as DRF stability has not been the focus of any studies in this field to date. One could argue that the clinically relevant problem is not the mere presence of a DRF. The clinical challenge is to predict future loss of threshold alignment of such a DRF during follow-up, which is our main driver to offer patients surgery. An interpretable deep learning algorithm to predict the probability of loss of threshold alignment (ie, DRF instability) on plain injury radiographs holds the promise to be a game changer in clinical care, as it will eliminate human (surgeon) biases and thus augment patient-centered shared surgical decision making. In general, treatment of DRFs is either conservative with a cast or surgical, most often by open reduction and internal fixation (ORIF) with plates and screws. Treatment decisions are distinct per country, hospital, or even surgeon: “what you get depends on where you live, and who you see.” In Australia, patients are offered surgery in up to 80% of DRFs, while in the United States and Netherlands, operative rates are 26% and 10%, respectively. One could argue that such variation is undesired: some patients may be overtreated while some may be undertreated. Shared surgical decision making in an early treatment phase remains challenging because it is based on surgeons' “art” to deem a fracture stable or unstable. However, studies have shown that human estimation of the probability of future loss of threshold alignment of a DRF is fallible. An online experiment by the Science of Variation Group (Boston, MA) showed that surgeons accurately predicted redisplacement of a DRF based on plain injury radiographs in only 54% of patients with a reduced DRF, a guesstimation. In a subsequent experiment, surgeons evaluated both radiographs and injury CT scans of reduced DRFs: accuracy improved up to 70% with the use of CT. In short, the diagnostic performance characteristics of human interpretation of injury radiographs to determine fracture stability of DRFs may be unsatisfactory for clinical practice. For this reason, a reliable and interpretable deep learning model to predict loss of threshold alignment in DRFs, with a higher accuracy than surgeons, is of significant interest to reduce undesired treatment variation. Moreover, this approach may also shift the interest in the field of AI-driven computer vision research from the rather simple detection of pathoanatomy to prediction of clinical outcomes. Therefore, we included patients with DRFs from retrospective databases from 2 hospitals and one multicenter prospective database. Patients were required to be primarily treated with a cast, have complete radiological follow-up, and not have received surgery while still “stable.” Trauma and reduction radiographs, both posteroanterior (PA) and lateral, were collected when available, as well as sex and age. To augment training of the algorithm, 21 landmarks (11 on PA radiographs, 10 on lateral) were manually annotated on radiographs. A convolutional neural network (CNN) was trained on these radiographs and annotations. The algorithm was trained on 2136 radiographs (583 cases) and tested on 563 radiographs (150 cases), taking 50 patients from each cohort. Our model had an AUC of 0.83 and achieved 76% accuracy, 84% sensitivity, and 68% specificity in predicting future DRF loss of threshold alignment. In contrast to previous deep learning models to detect pathoanatomy, these results are promising because this is the first CNN algorithm used in (orthopaedic trauma) surgery to make a prediction about future fracture alignment. With the rise of artificial intelligence (AI), in particular, deep learning methodologies with CNNs that can analyze images such as plain radiographs, these algorithms will aid surgeons in objectively quantifying the probability of loss of threshold alignment of DRFs. 4. Intraoperative Decision Making Using Augmented Reality—Use in Placement of Iliosacral Screws Iliosacral screw placement is considered a complex procedure that requires a thorough understanding of the patient's sacral anatomy. The presence of sacral dysmorphism, osteoporotic bone, obesity, and overlying bowel gases makes it even harder to visualize the osseous fixation pathways on conventional perioperative fluoroscopy. Owing to the need for highly skilled surgeons, the large increase in osteoporotic pelvic fractures, and the benefit that many of these patients have from minimally invasive fixation, it is becoming more difficult to provide the appropriate care in time. Navigation can be used to simplify this procedure so that the complex sacral anatomy is no longer a threshold for the less experienced surgeon. These navigation systems, however, are very expensive and require some form of intraoperative CT imaging. This cost must be reduced and the necessary hardware simplified to make this a safe procedure for less experienced hands. Mixed reality (MR), augmented reality (AR), and extended reality (XR) overlap and are used to describe an environment where real-world and computer-generated contents overlap so that virtual and real objects can interact in real time. This overlap is usually created using augmented reality glasses (headset) such as the Microsoft Hololens, Google Glass, or the Apple Vision Pro. This means, for example, that a patient's deep anatomy that has been previously modeled using, for example, a CT scan can be overlaid onto the patient's skin. This results in an improved visualization and better understanding of the anatomy, and it can also be used to plan and assess a surgical procedure. The AR workflow that was developed for the planning and placement of iliosacral screws is shown in Fig. 2. Preoperatively, a CT scan is performed to assess the fracture (step 1). During the CT scan, skin markers are attached to the patient. These are later used to register the 3D models for the patient. With the help of artificial intelligence, the segmentation (creation of 3D models from CT images) of the patient's anatomy and the planning of the screw trajectories occur in an automated way (step 2). This drastically speeds up this process. The planned screw trajectories can then be assessed and adapted in a multiuser viewer application (step 3). This can be performed in both an augmented or a virtual environment. In the last phase, the 3D models and planned screw trajectories are projected onto the patient in the operating theater. The intraoperative fluoroscopy is used to verify the chosen trajectories.Figure 2.: Augmented reality (AR) workflow for planning and placement of iliosacral screws.Surgical augmented reality applications and surgical navigation are complementary tools. They can be used combined or separately. The main challenge for both technologies is an adequate registration of the planned trajectories or projected models. If the registration is poor, the accuracy will be poor. The improvement of this registration accuracy and turning the registration into a robust, fail-safe procedure are key for this technology to succeed and potentially reduce the costs associated with it. Artificial intelligence can help us to improve this and make the registration more software-based than hardware-based. Currently, intraoperative CT remains of importance. The hardware costs will also drop when an accurate registration is possible without intraoperative CT. In conclusion, augmented reality can help us visualize complex anatomy, but technical improvements still need to be made to improve accuracy. A robust registration method without intraoperative CT is potentially a game changer for cost reduction. Artificial intelligence can assist with this registration and automate the workflow, making it fast and easy to use. 5. AI for Outcome Prediction: Watson Health Trauma Pathway Explorer—A Teaching Tool for Polytraumatized Patients by Using Visual Analytics Polytraumatized patients face a spectrum of complications and adverse outcomes that preexisting conditions, comorbidities, specific injury patterns, and pathogenetic changes can influence. This complex landscape underscores the critical need for early assessment to identify risk factors of complications, which is essential for patient care and educating medical residents and young attending physicians. In response to this need, a new visual analytics tool was developed in collaboration with IBM over several years. This tool was designed to predict special risk situations for complications using data from patient records following approval from the local institutional review board and a trauma database at a Level I trauma center. The inclusion criteria for this study were patients older than 16 years with an Injury Severity Score (ISS) greater than 16.2 Throughout the development phase, parameters associated with the development of complications were meticulously assessed. These parameters included patient age, abbreviated injury scales, ISS, ATLS shock classification, surgical strategy decisions, and various physiological measurements on admission. The daily monitoring of various laboratory values and the transfusion volumes was also documented. The primary end points were early in-hospital mortality (within 72 hours), sepsis, and SIRS. IBM's Watson was used for the pathway development. Watson is a cutting-edge AI program that created projections of the clinical course, which were then visualized using a Sankey analytics tool. Sankey diagrams, named after Irish Captain Matthew Henry Phineas Riall Sankey, visually represent flow data, where the width of each flow pictured is proportional to the volume (Fig. 3a and b).Figure 3.: An illustrative scenario showcasing the impact of age on the risk of sepsis. Patients older than 65 had a 33% risk (A) of developing sepsis while patients aged between 29 and 65 years had a risk as low as 2% (B).The initial data set for the first stage of our project included 3655 patients. The final data set included 1925 patients after stratification and the exclusion of 1730 due to incomplete data. We leveraged the Watson Health Trauma Pathway Explorer to display parameters, individual patient data, surgical pathways, ATLS groups of hemorrhage, and outcomes. Each aspect could be modified according to individual data sets and the desired outcomes. These interactive Sankey allowed to clinical pathways based on real-world data and make decisions in critical This tool a trauma care by a learning into artificial An illustrative scenario the impact of age on the risk of sepsis. Patients older than 65 years had a 33% risk of developing sepsis while patients aged between 29 and 65 years had a risk as low as 2% (Fig. 3a and The of visual analytics in its ability to the of machines with the and cognitive of as a between the and data, the of raw data, of generation, scenario the of new knowledge. The true of visual analytics is assessed in the of the and the to the can response new the of the data, and in the Visual Pathway Analytics the of models for clinical pathways at the the by the and of data, often in systems that are mere for the required data. This with a data and uses algorithms for medical and other necessary can the pathways in a flow a Sankey which allows for the of treatment and the of pathways to or outcomes. the Watson Health Trauma Pathway Explorer an step AI. AI for Outcome Prediction: Trauma of for Patients With Trauma and laboratory values have been by trauma surgeons as reliable of mortality have various systems to identify patients at the risk of systems are for clinical decisions, patient stability for the for and However, the application of these presents they and regarding the of these these have not been on known to be The Trauma of was developed to these is a machine learning algorithm that uses data collected in medical the need for surgeons to manually input data. This algorithm functions within the mortality predictions for the initial of a with every to changes in patient The model was developed with data from trauma patient between and and was with patient from to performance was evaluated after 1 of clinical The accuracy of the model has to be In its first of clinical use patients, the final accurately predicted of to with a of and a specificity of The was at while the model successfully predicted in with only a of The as by the the operating was an models based on such as or using in an model, the many of the found in models. can potentially decision making for trauma patients early in are focused on with other medical to the and its additional artificial intelligence (AI) into clinical is expected to occur within the next decade, patient The symposium that AI algorithms could be at each stage of clinical applications, of Narrow AI, preoperative and fracture to surgical intraoperative for surgical and of complications and in Generative AI are expected to tasks such as virtual patient automated and automated preoperative to such as the Orthopaedic Trauma Association (OTA) to the of AI into clinical by to of AI technology in and OTA the of a orthopaedic trauma data for AI and performance for new models that can be to
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.402 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.507 Zit.