OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.03.2026, 11:01

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

CORR Insights®: Does Artificial Intelligence Outperform Natural Intelligence in Interpretation of Musculoskeletal Radiological Studies? A Systematic Review

2020·0 Zitationen·Clinical Orthopaedics and Related ResearchOpen Access
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2020

Jahr

Abstract

Where Are We Now? Machine learning, and artificial intelligence more generally, are quickly growing areas of applied medical decision-making research. Compared with what are now considered more-traditional analytical approaches, such as statistical prediction models, machine learning is seen as providing unique advantages; in particular, it may improve healthcare delivery because it can learn from millions of digitized patient charts or images, and so provide robust, reproducible, and rapid decision-support tools [8, 14]. Artificial intelligence has already transformed many aspects of daily life outside health care; machine-learning algorithms allow us to translate large pieces of text into any language, recognize speech, drive a car, make a plane take off or land, or detect banking fraud. The advantages of machine learning include the ability to analyze enormous amounts of data, capture complex nonlinear relationships among these data, and consider a wide range of data. It can handle structured data, similar to other statistical prediction methods, but machine learning can also analyze free text and images, as well as high-frequency sampled data streams such as those produced by wearable devices. In this respect, no approach other than artificial intelligence and machine learning has enabled the analysis of such data so far. These approaches have begun to show promise in orthopaedic surgery, specifically. For example, one recent study used machine learning to predict whether patients would achieve clinically important improvements in validated outcome scores 2 years after joint arthroplasty [5, 10], which is important in light of the fact that even experienced surgeons’ abilities in this sort of prediction for patients undergoing TKA are no better than a coin toss [6]. However, despite the hype and hopes about artificial intelligence, more-nuanced opinions have emerged [1, 13]. Identifying associations among data does not prevent confounding, and this may prevent us from translating modifiable factors flagged by algorithms into real targets for interventions. Additionally, despite the underlying idea that the more data we have to train an algorithm, the more accurate they are, the greed for more data does not always translate into more-accurate predictions [1]. Predicting what will occur in 1, 5, or 10 years may be difficult because all past data, not just available data, do not contain sufficient information. This may explain why machine-learning algorithms have often outperformed human experts in imaging or diagnostics, where most information is present in the data analyzed [8]. The advantage over more-classic statistical models for longer-term risk prediction modeling is likely less evident [2]. In this issue of Clinical Orthopaedics and Related Research®, Groot et al. [7] performed a systematic review comparing artificial intelligence and human intelligence in interpreting musculoskeletal imaging. In analyzing a final sample of 12 studies, they found that machine-learning models had performance comparable to that of clinicians in terms of accuracy, sensitivity, and specificity (an area under the receiver operating characteristic curve was not reported by most of the studies). This result is similar to that of a recent review that was not focused on musculoskeletal imaging [11]. It is important to note that all of the studies in Groot et al.’s [7] review suffered from important methodologic shortcomings and problems with scientific reporting; none adhered to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) reporting guidelines [4]. No study was a prospective, randomized controlled study, which is the gold-standard design to evaluate any clinical benefit that supports the use and reimbursement of an algorithm. This is in line with the findings of another recent review comparing artificial intelligence and clinician intelligence that found only one randomized controlled trial among 91 completed or ongoing studies [13]. Groot et al. [7] did not specifically evaluate the risk of bias using (for example) the Prediction Model Risk of Bias Assessment Tool [15], but they did exclude studies with an unclear evaluation of ground truth (that is, the reference standard on which the algorithm is trained or tested), and the 12 included studies verified six, but not all eight, of the critical appraisal items. In a review by Nagendran et al. [13], the risk of bias was assessed to be high for most studies, particularly because of a high risk of bias in the analysis domain. Still, the findings of Groot et al. [7] are exciting because they provide empirical evidence of the comparison between artificial intelligence and clinicians for interpreting orthopaedic imaging, beyond opinion or success stories. Where Do We Need To Go? Claims that artificial intelligence is superior to that of clinicians may be overstated [7, 11, 13], although performance may have been superior in several individual studies, and the ability of artificial intelligence algorithms to quickly analyze large numbers of images may also be considered an advantage. Most of all, the systematic reviews cited above [7, 11, 13] highlight that the reporting of studies that developed and validated artificial intelligence algorithms for medical imaging should be improved, and studies properly evaluating the utility of these algorithms in the real world are lacking. Good scientific reporting is important because when a study is poorly reported, it becomes impossible to evaluate with confidence what has been done; stated another way, it’s hard to know whether the study was designed and conducted in a way that makes the results trustworthy, or whether the findings should be applied in clinical practice. This is even more important for studies using artificial intelligence and machine learning, in which methods and terminology (for example, deep neural networks, gradient boosting, and ensemble learning) may be less familiar to readers, who may fail to understand what was done or whether the approaches are correct. It is therefore important that readers become accustomed to this new field; good guides for this are available [9, 12]. However, it is also the responsibility of authors, and ultimately of journal editors, to report studies completely and appropriately; the TRIPOD guideline is important to follow, although it is not specific to machine learning or artificial intelligence [4]. Finally, tools to evaluate the risk of bias of studies are also important for interpretation [15]. Transparent reporting and understanding the epidemiologic, statistical, and computing methods greatly facilitate such an evaluation. Finally, a fair evaluation of the clinical impact of using artificial intelligence in medicine is needed. This is true in the diagnostic setting; for example, with imaging, but it really applies equally to other types of decision-support tools. Indeed, a machine-learning algorithm may perform well and deliver a good area under the curve, but still bring no or little clinical benefit in practice. The use of a machine-learning algorithm in routine clinical practice may face acceptability issues by surgeons or patients, and it may be difficult integrate this type of algorithm at the point of care. It also might not change decision-making because of many other factors. Implementation of any decision-support tool in clinical practice could even produce unexpected adverse effects [13], and this should also be evaluated. How Do We Get There? The application of artificial intelligence in medicine is relatively new. This fast-moving field may improve healthcare, and ultimately patient outcomes. As with any innovation, however, it may take some time for the field to establish methodologic standards for the different types or stages of tool development, particularly by matching the standards of mathematics, informatics, or data science to those of medicine, epidemiology, or public health. Progress has been made; tools already exist and new ones are being developed. For instance, a TRIPOD guideline, “ML” for machine learning, is expected soon [3]. In terms of the evaluation of clinical benefit, I hope that more randomized controlled trials will be conducted, but without incentives from the FDA or other regulatory agencies, this may be difficult to achieve.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingAdvanced X-ray and CT Imaging
Volltext beim Verlag öffnen