Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Ethical and inclusive challenges of machine learning in anaesthesia
0
Zitationen
2
Autoren
2025
Jahr
Abstract
The expanding capacity of machine learning and artificial intelligence to revolutionise every industry raises numerous philosophical challenges, demanding prudent and rigorous processes as well as ethical scrutiny before implementation. The human-machine interface challenges many ethical and behavioural assumptions while creating tensions. How is human behaviour impacted by machine learning? Who bears the responsibility of machine learning-associated harmful conduct and outcome? Artificial intelligence has enormous capacity to aid in clinical decision-making, to care for patients and to streamline clinical workflow [1-3]. However, its ease of use and the obscurity through which it performs its arcane calculations to produce mysterious and wonderful results, give rise to risks that artificial intelligence may harm real patients, marginalise people from equity-deserving groups in modern healthcare and dehumanise people we intend to help. Theoretical and philosophical frameworks can be used as ways of seeing, and potentially not seeing, clinical questions through a different lens. Kotzé et al. present machine learning models to risk stratify patients referred from primary care in the UK [4]. The automated machine learning model comprised demographic data, medications, allergies and a dichotomised ASA physical status, classifying patients as being at low-risk (ASA 1–2) or high-risk (ASA 3–5) for peri-operative morbidity and mortality [4]. Their algorithm helps to streamline and expedite the referral process, answering priorities set out by the NHS. Streamlining this referral process speaks to the neoliberalisation of healthcare, where principles of efficiency and fiscal responsibility run through healthcare institutions. This study shows a means of saving time on resource-intensive manual tasks, cost-cutting by shifting some of the manual labour from peri-operative care teams to automated processes, thereby allowing the clinical team to focus their time on patient optimisation [4]. Though efficiency has its place in ensuring that more patients obtain swift access to healthcare, these tools carry potential consequences. Kotzé et al. should be applauded in their use of prospective administrative and clinical deidentified data, which were anonymous to the research team [4]. Application of the national data opt-out respected patient autonomy and agency. Data sharing agreements and contracts were determined a priori and a data access committee reviewed the data access request. The authors also embedded the Index of Multiple Deprivation decile in model training to capture the impact of socio-economic status, and took careful steps to avoid data leakage, which can lead to optimism bias (i.e. performance which is not maintained in a real-world setting). This editorial focuses on machine learning as a tool used by moral agents rather than building artificial moral agents. This editorial does not serve as a panacea to all machine learning ethical questions, but rather as a general forethought when designing and implementing machine learning, drawing on Kantian principles to ensure these applications are designed as tools that respect patient autonomy, uphold the intrinsic value of humans and avoid magnifying inequities. Physicians agree to conform to defined medical standards and principles. Though advances in technology are starting to revolutionise medicine with the introduction of machine learning and artificial intelligence, physicians must continue to do no harm. This pledge aligns with Kant's philosophy of treating patients with respect and dignity. Table 1 shows Kant's categorical imperative formulations, mobilised as one of many potential lenses to discuss philosophical questions applicable to machine learning. A rule based on intrinsic value, or as Kant described, as ‘if the action is represented as good in itself and therefore as necessary’ [5] To act according to rules that we ourselves would want others to follow; if negative consequences arise from the ways in which we act, then the act would be considered morally wrong Before acting, one should ask themselves, what if everyone else were to do this action? What impact would it have? Kant's first categorical imperative describes how individuals can identify universalisation rules rather than why Applying the same standard of care for all patients, irrespective of healthcare insurance coverage Taking a pre-anaesthetic history and physical exam As anaesthetists, we must provide informed consent, discussing both the benefits and the risks for each anaesthetic delivered People's autonomy, dignity and value should be respected; people should never be used for peoples' own selfish purposes Upholding human dignity A patient with a terminal illness refuses care; healthcare professionals should understand the reasons why, honour and respect the patient's decision The anaesthetic plan should be co-constructed with the patient, respecting the patient's goals The patient's body should not be exposed when not required We do not insist that Kant's categorical imperative is the only lens through which to look at machine learning tools. Kant holds that individuals cannot violate moral law, unlike consequentialism that may justify potential harm for the greater good of humankind. We insist that Kantian philosophy can be mobilised as an ethical framework, providing clinicians, researchers and programmers with a guide to human dignity when designing, developing and deploying machine learning and artificial intelligence systems. In machine learning, bias is defined as a systematic decision-making error which drives an artificial intelligence algorithm to produce differential or inequitable outcomes [6]. These outcomes can occur in people with protected characteristics (e.g. age, sex or disability) [7], thus equity-deserving groups, and can pertain to other demographic characteristics, such as gender; race; ethnicity; sexuality; socio-economic status; education; and income level. These systematic decision-making errors and biases can be even further pronounced for people self-identifying with intersectional identities. Bias can occur in the context of the predictive algorithm itself (i.e. due to bias introduced in model training) and during human-artificial intelligence interaction, determined by the cognitive processes involved in whether and why human decision-makers override or agree with algorithmic decisions. The latter can be caused by increased cognitive load while using artificial intelligence-powered systems (e.g. because of a poorly designed user interface or increased documentation burden) [8]; clinician mistrust in the model's output (e.g. model is not interpretable or an experienced clinician places greater emphasis on their own judgement); or the user's cognitive biases (e.g. choosing to follow artificial intelligence recommendations for certain patients but overriding it for others). When dealing with the design of predictive algorithms, bias can be introduced with any decision made during model development. It can be embedded from the beginning or during the framing of the problem. For example, Kotzé et al. chose to dichotomise ASA physical status and use 30-day mortality as an outcome [4]. Though based on published literature, these decisions can cause a butterfly effect, where small changes in initial inputs can cause significant and unpredictable changes in model training and outputs [9]. Most machine learning analyses are retrospective, relying on clinical data collected for alternative purposes. This retrospective analysis can introduce several sources of bias, including but not limited to, unbalanced training datasets leading to representation bias for certain races or outcomes [10]. Despite the introduction of reporting guidelines for different types of machine learning analyses, many studies show an overreliance on reporting performance metrics that relate to the whole cohort, meaning predictions are accurate for well-represented groups but limited in those that are under-represented [11]. In the dataset used by Kotzé et al., the proportion of people aged ≥ 80 y and those in Index of Multiple Deprivation quintile 4 were much lower than for other age or deprivation groups [4]; the algorithm has fewer of these data from which to learn, leading to a higher likelihood of generalisations and inaccurate predictions. These inaccurate predictions have been seen in other spheres, such as facial recognition algorithms, which have been shown to recognise 35% of women with darker skin inaccurately due to limited training data, compared with only 0.8% of White men [12]. Healthcare data are noisy and messy. One example of messiness is missing data, which may not be missing at random; for example, patients living in under-resourced communities may be less likely to interact with healthcare services and thus have more missing data (e.g. diagnosis codes, medication histories) than those from high-income groups [13]. The ways in which researchers handle these data can have profound model training implications. For example, many algorithms require evenly sampled data with an equal number of readings per feature, not reflecting reality, where some data are sampled minute by minute (e.g. vital signs in critical care) or once per day (e.g. laboratory investigations). How researchers choose to fill in the gaps and impute the data to weight and engineer features may impact its distribution, moving it away from the population it is intended to sample. This imputation also deteriorates the performance of the artificial intelligence algorithms when used on real-world patient data, which may differ significantly from that used in the training [14]. Contextual data are often key to decision-making (e.g. examination findings, mobility, functional assessments) and may be incorporated into models using natural language processing. However, clinical notes are often sparse, may not accurately capture clinical findings and may reflect the cognitive biases of the writer [15]. Similarly, in the context of supervised learning, where the model learns a ground truth determined by a human operator (e.g. identifying cancerous vs. benign lesions), the algorithm may learn and perpetuate these cognitive biases or misclassifications [16]. Medical artificial intelligence literature can also exhibit publication bias. For example, more than half of all published clinical artificial intelligence models use data from the USA or China, meaning these populations and certain research organisations are over-represented in the literature [17]; an example of this is the MIMIC dataset, which has been used in 2769 critical care publications to date [18] and is thus heavily over-represented. Data from countries without electronic health records are sparse and international data sharing initiatives should be prioritised to mitigate this. Furthermore, as with other domains, artificial intelligence research is biased towards positive results, meaning the limitations of artificial intelligence technology are poorly understood. Machine learning brings potential ethical, practical and theoretical concerns with the possibility of unintended harms. Mechanisms by which machine learning may deepen inequities and harm patients are multifold: algorithms are based on historical data, with all of their inherent biases; people from equity-deserving groups often with intersecting social identities are rarely represented in the literature, thus generating less, if any, data on which to base algorithms, posing potential representation and aggregation biases [19]; and the amoral nature of the black box of artificial intelligence algorithms, which work in ways that are unknown often even to developers. When machines learn from data that do not represent patients from different equity-deserving groups and when they are generated from historical understandings of individuals plagued by discriminatory and racist notions, we should not be surprised when these machine algorithms magnify inequities [20]. The potential to perpetuate, and even amplify, inequities is at odds with Kant's categorical imperatives. However, the discrimination need not necessarily be based on overt biases. Machine learning may be programmed with a facially neutral practice that can potentially harm people from certain equity-deserving groups, a practice that has been termed proxy discrimination. This occurs when a facially-neutral trait is used as a substitute for a prohibited trait [21]. For example, a postcode appears to be a facially neutral categorisation, but often these are used as proxies for race and ethnicity. These facially-neutral traits have been used as a means of circumventing antidiscrimination laws, despite still discriminating against people from certain groups. In medicine, proxy discrimination can result in inequitable access to healthcare. For example, if a postcode suggests lower socio-economic status, patients may receive less timely treatment than people living in more affluent locations. However, from a Kantian perspective, such proxy discrimination perpetuates structural injustices that are in direct tension with Kant's maxim on universalisability. Although machine learning has the potential for powerful supplementation that aids human efficiency and precision, it has also led to harm in numerous industries. Machine learning recruitment tools have discriminated against women, as the algorithms were created with data favouring curricula vitae from men [22]; chatbots become racist after being trained with social media data [23]; and photo applications that had been trained to recognise individuals with lighter skin consequently labelled a photo of two Black people as gorillas [24]. Because machine learning is presented as a scientific endeavour that humans simply cannot understand, these inaccuracies and tendencies towards discrimination are disguised, with some people labelling these types of mistakes and marginalisations as instances of “the computer… just doing its job” [25]. But machine learning is not neutral, infallible or immune to the biases to which humans are vulnerable. It is susceptible to making mistakes and to engaging in discrimination and racism, which is all the more dangerous by hiding its discrimination behind the legitimacy of complex computer algorithms; this lack of transparency thus disrespects the universalisability law. According to Kantian philosophy, machine learning should thus never prioritise efficiency over patient welfare. One could argue that machine learning is not necessarily an ethical machine, but rather a tool that should be used ethically. Just as a scalpel can be used to make an incision to perform life-saving surgery but could also be used to harm a person without benefitting them, physicians must use these tools ethically. People may then argue that it should not matter if ethics are not built into machine learning and that the onus falls on physicians to use these tools ethically. As a result, then, should patients be informed when these tools are used? Mello et al. recommended that if the risk of causing physical harm and the opportunity for patient agency in response to a disclosure is high, then healthcare workers are obliged to do so [26]. However, if both risk and agency are low, the use of artificial intelligence need not be disclosed [26]. The issue lies in the black box nature of machine learning. Unfamiliar with the algorithms used and the data upon which the algorithms have been constructed, physicians are unable to see how the tools make decisions. The nature and complexity of these tools obscure clinicians' ability to engage in an unbiased exploration and reflection on how to use these tools ethically. Indeed, sometimes the tools are so flawed that it is impossible to use them in a way that avoids the built-in discrimination, because the tools themselves perform their sorting and categorising, valuing and devaluing, all behind the scenes and all obscured to the operator. How do we then make these tools non-discriminatory to avoid doing harm, and how do we build structures so they promote equity rather than magnify inequity? Machine learning can be used as a tool to support peri-operative physicians in delivering better, personalised patient care, what has been termed ‘augmented intelligence’ rather than a ‘fully-autonomous, rule-generating approach’ in which machine learning independent of a human creates its own rules and conduct. Machine learning functions by input–output relationships that cannot engage in reflective, rational reasoning alone. Rather than creating fully-autonomous machines, perhaps an augmented intelligence approach will not only enable moral agency but also engage in shared decision-making with the patients to whom we have a moral and ethical obligation to provide optimal patient care. This approach thus abides by the Kantian respect for the rational agency of patients in allowing and deciding if and how their data will be used in machine learning algorithms. Are machine learning tools dehumanising by reducing people to items and numbers? If the tool is too crude, inequities can be promoted or even accentuated. As a tool, machine learning needs to respect and centre patient dignity first and foremost. When used, it must serve as a tool, aiding physicians in their clinical judgement. Physicians are responsible for ensuring that the patient presenting to them is a human worthy of dignity and not solely an object for optimisation, one who provides informed consent to having their data used for machine learning research while respecting privacy. Machine learning has the potential for doing harm but, as a tool, it can benefit patients. Kantian philosophy reminds us of the importance of rational morality, and although machine learning is here to stay, it should augment and not replace physicians, while expanding our clinical armamentarium for moral and ethical reflection. The authors thank Dr Ariel Lefkowitz for his thoughtful review given his expertise in philosophy. GL is supported by a Merit Award courtesy of the Department of Anesthesiology and Pain Medicine, University of Toronto, and the Department of Anesthesia and Pain Management, University Health Network – Sinai Health System. GL is an Editor of Anaesthesia. No other competing interests declared.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.239 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.095 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.463 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.428 Zit.