VI. Recommendations for common ethical standards for trustworthy AI
The preceding discussion in the section entitled “Potential impact of AI on the doctor-patient relationship” concluded that ethical standards need to be developed around transparency, bias, confidentiality, and clinical efficacy to protect patient interests in informed consent, equality, privacy, and safety. Together, such standards could serve as the basis for deployments of AI in healthcare that help rather than hinder the trusting relationship between doctors and patients. These standards can address both how systems are designed and tested prior to deployment, as well as how they are implemented in clinical care routines and institutional decision-making processes.
The Oviedo Convention acts as a minimum standard for the protection of human rights which requires translation into domestic law. On this basis, there is an opportunity to make specific, positive recommendations concerning the standard of care to be met in AI-mediated healthcare. These recommendations must not interfere with the exercise of national sovereignty in standard setting through domestic law and professional bodies as detailed in Article 4 of the Oviedo Convention. However, it is also possible to set standards which do not interfere with Article 4 and can be considered directly enforceable. Specifically, as noted by Andorno:
“The common standards set up by the Council of Europe will mainly operate through the intermediation of States. This does not exclude of course that some norms contained in the Convention may have self-executing effect in the internal law of the States having ratified it. This is the case, for instance, of some norms concerning individual rights such as the right to information, the requirement of informed consent, and the right not to be discriminated on grounds of genetic features. Prohibition norms can also be considered to have immediate efficacy, but in the absence of legal sanctions, whose determination corresponds to each State (Article 25), their efficacy is restricted to civil and administrative remedies.”
Read more
Where AI can be observed to have a clear impact on rights and protections set out in the Oviedo Convention, it is appropriate for the Council of Europe to introduce binding recommendations and requirements for signatories concerning how AI is deployed and governed. Recommendations should focus on a higher positive standard of care with regards to the doctor-patient relationship to ensure it is not unduly disrupted or by the introduction of AI in care settings. Of course, such standards should be supportive to a degree of local interpretation around key normative issues like acceptable degrees of automation bias, acceptable trade-offs between outcomes between patient groups, and similar areas influenced by local norms.
The following example recommendations detail possible essential requirements and recommendations for an intelligibility standard that aims to protect informed consent in AI-mediated care, a transparency standard for public intelligibility, and a standard for collection of sensitive data for purposes of bias testing. Each should be treated as an example of the type of recommendation that can be drawn from the preceding discussion of the potential ethical impacts of AI on the doctor-patient relationship.
Intelligibility requirements for informed consent
According to the Explanatory Report, Article 5 of the Oviedo Convention contains an incomplete list of information that should be shared as part of an informed consent process. As this list is incomplete, the Council of Europe could set standards for what and how information about the recommendation of an AI system concerning a patient’s diagnosis and treatment should be communicated to the patient. Given the traditional role of the doctor in sharing and discussing this type of information in clinical encounters, these standards should likewise address the doctor’s role in explaining AI recommendations to patients and how AI systems can be designed to support the doctor in this role.
Several concepts are common across the questions and goods that motivate interpretability in AI. Interpretability methods seek to explain the functionality or behaviour of the ‘black box’ machine learning models that are a key component of AI decision-making systems. Trained machine learning models are ‘black boxes’ when they are not comprehensible to human observers because their internals and rationale are unknown or inaccessible to the observer, or known but uninterpretable due to their complexity. Interpretability in the narrow sense used here refers to the capacity to understand the functionality and meaning of a given phenomenon, in this case a trained machine learning model and its outputs, and to explain it in human understandable terms.
‘Explanation’ is likewise a key concept in AI interpretability. Generically, explanations in AI relate ‘the feature values of an instance to its model prediction in a humanly understandable way’. This rough definition hides significant nuance. The term captures a multitude of ways of exchanging information about a phenomenon, in this case the functionality of a model or the rationale and criteria for a decision, to different stakeholders.
To understand how ‘explanation’ can be operationalised in medicine, two key distinctions are relevant:
- First, methods can be distinguished in terms of what it is they seek to explain. Explanations of model functionality address the general logic the model follows in producing outputs from input data. Explanations of model behaviour, in contrast, seek to explain how or why a particular behaviour exhibited by the model occurred, for example how or why a particular output was produced from a particular input. Explanations of model functionality aim to explain what is going on inside the model, whereas explanations of model behaviour aim to explain what led to a specific behaviour or output by referencing essential attributes or influencers on that behaviour. It is not strictly necessary to understand the full set of relationships, dependencies, and weights of features within the model to explain model behaviour.
- Second, interpretability methods can be distinguished in how they conceptualise ‘explanation’. Many methods conceptualise explanations as approximation models, which are a type of simpler, human interpretable model that is created to reliably approximate the functionality of a more complex ‘black box’ model. The approximation model itself is often and confusingly referred to as an explanation of the ‘black box’ model. This approach contrasts with the treatment of ‘explanation’ in philosophy of science and epistemology in which the term typically refers to explanatory statements that explain the causes of a given phenomenon.
The usage of ‘explanation’ in this fashion can be confusing. Approximation models are best thought of as tools from which explanatory statements about the original model can be derived. Explanatory statements themselves can be textual, quantitative, or visual, and report on several aspects of the model and its behaviours.
Further distinctions help classify different types of explanations and interpretability methods. A basic distinction in interpretability can be drawn between global and local interpretability. This distinction refers to the scope of the model or outputs a given interpretability or explanatory method aims to make human comprehensible. Global methods aim to explain the functionality of a model as a whole or across a particular set of outputs in terms of the significance of features, their dependencies or interactions, and their effect on outputs. In contrast, local methods can address, for example, the influence of specific areas of the input space or specific variables on one or more specific outputs of the model.
Models can be globally interpretable at a holistic or modular level. Holistic global interpretability refers to models which are comprehensible to a human observer in the sense that the observer can follow the entire logic or functional steps taken by the model which lead to all possible outcomes of the model. It should be possible for a single person to comprehend holistically interpretable models in their entirety. An observer would have ‘a holistic view of its features and each of the learned components such as weights, other parameters, and structures’.
Given the limitations of human comprehension and short-term memory, global holistic interpretability is currently only practically achievable on relatively simple models with few features, interactions, or rules, or strong linearity and monotonicity. For more complex models, global interpretability at a modular level may be feasible. This type of interpretability involves understanding a particular characteristic or segment of the model, for example the weights in a linear model, or the splits and leaf node predictions in a decision tree.
With regards to local interpretability, a single output can be considered interpretable if the steps that led to it can be explained. Local interpretability does not strictly require that the entire series of steps be explained; rather, it can be sufficient to explain one or more aspects of the model that led to the output, such as a critically influential feature value. A group of outputs is considered locally interpretable if the same methods to produce explanations of individual outputs can be applied to the group. Groups can also be explained by methods that produce global interpretability at a modular level.
These distinctions lead to some initial conclusions about how AI can best explain itself to doctors and patients. At the point of adoption global explanations of model functionality seem appropriate to ensure a reliable fit between the intended use of the AI system in a given healthcare context, and the actual performance of the system. For explaining specific outputs or recommendations to patients, explanations of model behaviour formed as explanatory statements appear to strike the best fit between explaining the decision-making logic of the system while remaining comprehensible to expert and non-expert users alike. In this context methods such as ‘counterfactual explanations’ may be preferable as they facilitate debugging and testing of system performance by expert users while remaining comprehensible on an individual explanation level to non-expert patients. To summarise, to make AI systems intelligible to patients, simple, local, contrastive explanations are preferable to global approximation explanations which can be difficult to understand and interpret.
An alternative but complementary approach is to use only intrinsically interpretable models in clinical care to enable health professionals to holistically understand systems and better explain them to their patients. Implementing this approach would, however, create additional requirements for technical expertise in computer science, statistics, and machine learning among health professionals which could be very difficult and perhaps unreasonable to meet in practice.
Public register of medical AI systems for transparency
As regards the issue of disclosure to patients of the usage of AI systems for operational and clinical purposes discussed in the section entitled “Transparency to health professionals and patients”, the Parliamentary Assembly of the Council of Europe has recognised the importance of raising population awareness of uses of AI in healthcare to build trust with patients and ensure informed consent is possible in AI-mediated care. Specifically, their October 2020 report suggests that transparency of AI systems in healthcare “may require the establishment of a national health-data governance framework which could build on proposals from the international institutions. The latter include the Recommendation “Unboxing Artificial Intelligence: 10 steps to protect Human Rights” by the Council of Europe Commissioner for Human Rights (May 2019), the Ethics Guidelines for Trustworthy AI put forward by the European Union (April 2019), the OECD Recommendation and Principles on AI (May 2019) and the G20 Principles on Human-centred Artificial Intelligence (June 2019).”
Following these proposals and recommendations, a public database is seen as a key element to improve “algorithmic literacy” among the general public which is a fundamental precursor for exercising many human and legal rights.
Insofar as the proposed framework is designed to increase population awareness of AI systems in healthcare, it can best be thought of as a type of public register for AI systems in healthcare. Registries are public lists of systems currently in use containing a standardised description of each system. Information included on registries varies but can include things like the intended usage or purpose of the system; its manufacturer or supplier; the underlying method(s) (e.g., deep learning, regression); any testing undergone both in terms of accuracy but also biases and other ethical and legal dimensions; a description of training and testing datasets; and an explanation of how predictions or outputs of the system are utilized by human decision-makers or otherwise integrated in existing services and decision-making processes.[1] Registries also often have a feedback function to allow citizens to provide input on current and proposed uses of AI by public bodies and services.[2]
There are several examples of existing registries from municipal, national, and international public bodies. In 2020, Amsterdam and Helsinki launched public registries for AI and algorithmic systems used to deliver municipal services.[3] In November 2021, the UK Cabinet Office’s Central Digital and Data Office launched a national algorithmic transparency standard which will effectively function as a type of public register.[4] Internationally, the recently proposed Artificial Intelligence Act contains a provision to create a public EU-wide database in which standalone high-risk AI applications must be registered.[5] The Council of Europe has an opportunity to complement these emerging transparency standards by introducing a public AI register for medical AI in member states which is aimed at patients to raise awareness of AI systems currently in use by their public health services.
Collection of sensitive data for bias and fairness auditing
Biases in AI systems linked to gaps in training and testing data could foreseeably motivate greater collection of sensitive data about legally protected groups for purposes of bias and fairness testing. It is a generally accepted fact, that in order to prevent discriminatory or biased outcomes, data on sensitive groups must be collected. Failure to collect this data will not prevent discrimination against protected groups, but arguably make it more difficult to detect. Sensitive data is needed to test whether automated decision-making discriminated against groups based on protected attributes (e.g., data on race, disability, sexual orientation). On the other hand, collecting such data has significant privacy implications. This is a legitimate concern and closely related to troubling historical experiences that significantly harmed specific groups in society. For example, data collected for research and public purposes have contributed to eugenics in Europe, the UK and the US, genocide during WWII, racist immigration practices and the denial of basic human rights in the US, justification of slavery, forced sterilisation in the UK, US, Germany and Puerto Rico from the early to the mid-20th Century, punishment, castration and imprisonment of LGBT members, and denial to women of equal rights and protection (e.g. sexual violence). Clearly, privacy interests must be taken seriously when considering collection of sensitive personal data for purposes of bias testing.
Setting these concerns aside for a moment, one could be tempted to think that the bias problems will naturally be solved by collecting more (sensitive) data and closing gaps in representation in training and testing datasets. However, fair and equal outcomes will not automatically result when representation gaps and other data biases are closed. Awareness of inequalities is not the same as rectifying them. Rather, the persistence of social biases across Western societies suggest that significant political, social, and legal effort is needed to overcome them, rather than simply more data collection and testing.
Countering inequalities requires intentional and often cost intensive changes to decision processes, business models, and policies. To justify further collection and usage of sensitive data, it is necessary to first demonstrate serious commitment and political will to rectifying inequality. From a standard setting perspective, these observations suggest that any proposed collection of sensitive category data for the sake of testing medical AI systems form biases must have clear purpose limitations and confidentiality guarantees in place alongside a commitment to rectify social inequalities underlying biases discovered through testing. Operationalizing these commitments is not straightforward. The EU Artificial Intelligence Act, for example, proposes the creation of “regulatory sandboxes” in which AI providers can test their systems for bias using special category data collected explicitly for testing purposes. This proposal lacks the essential element of a commitment to rectify discovered inequalities.