AI auditing is the practice of assessing, mitigating, and assuring an algorithm’s safety, legality, and ethics.
Comprehensive audits encompass the entire pipeline of the system’s life cycle, addressing areas such as the reporting and justification of the business case, assessment of the developing team, and test datasets. They can also require access to the inputs and outputs of the system, and information about the inner workings of the model, depending on the level of access of the system.
The purpose of AI auditing is to assess a system, mapping out its risks in both its technical functionality and its governance structure and recommending measures that can be taken to mitigate these risks.
Algorithm auditing is an ongoing process that requires holistic knowledge of a system, including the context it is used in, what it was designed to do, and the type of technology used. The process of AI auditing has four distinct stages covering Triage, Assessment, Mitigation, and Assurance.
During the initial stage of an audit, the system is documented, and processes are assigned an inherent risk level ranging from high-risk to low-risk. The risk level given to a system depends on factors, including the context that it is used in and the type of AI that it utilises.
During the assessment phase, the system is assessed on five main verticals to give a clear evaluation of the current state of the system:
The key question in this vertical is whether the system delivers an appropriate performance level that matches the system's use case and context. In other words, does the system do what it is meant to and perform as expected? This vertical is particularly important for systems where failure would have significant consequences, such as financial loss. Ensuring a system is efficacious can prevent a poorly performing system from deploying and derailing essential processes.
This vertical addresses the risk that the algorithm fails in unexpected circumstances or when under attack, by asking whether the system is reliable and robust to changes in data or attacks from adversaries. This aims to investigate whether the system has been trained to withstand adversarial attacks, whether it performs differently in different contexts, and whether the algorithm performs as expected on unseen data.
Algorithmic bias can manifest in several ways with varying degrees of consequences for different subject groups. As such, the bias vertical investigates whether the system treats individuals fairly regardless of their subgroup membership or whether the system performs differently across different groups based on characteristics such as age, gender, and ethnicity. Ensuring a system is free from bias can prevent preferential or discriminatory treatment of individuals and ensure fairer outcomes and can help to ensure compliance with equal opportunity laws.
To evaluate the explainability of a system, questions centre around whether the system's outputs are understood, whether the capabilities and purpose of a system are communicated to relevant stakeholders, and whether the mechanics of the system are explainable in human terms. The vertical of explainability is key for critical applications that affect a large number of users and is important in cases where the outcomes of systems are disputed.
This vertical is the most important for applications that process personal and sensitive data and can be assessed by investigating whether the system has appropriate data minimisation and data stewardship practices. Having adequate privacy mechanisms can prevent data breaches and unlawful processing, enable swift action in the event of any breaches, and can ensure that individuals consent to the use of their data.
The outcomes of the assessment are used to inform the residual risk of the system. Based on this, actions to lower this risk are suggested. These can be technical, addressing the system itself, or non-technical, addressing issues such as system governance, accountability, and documentation.
For example, bias can be mitigated by debiasing the data the model is trained on, amending the model to make it fairer across groups, or amending the outputs of the model to make the predictions fairer, depending on the source of the bias. And to reduce explainability risks, better documentation procedures can be developed, and tools can be used to interpret better the model's decisions, including how different features are weighted.
Assurance is the process of declaring that a system conforms to predetermined standards, practices, or regulations. Assurance can also be given on a conditional basis, with mitigation actions still outstanding for higher risk processes.
For organisations operating in areas where such audits are required, such as New York City, this would also take the form of certification that the requirements of the regulation were met. Similarly, the EU AI Act will require conformity assessments to ensure that high-risk systems are meeting the obligations imposed on them.
Audits can be carried out at any point of the lifecycle of a system, including during the design phase or once the system has been deployed, so it is never too early or too late to start the process. At Holistic AI, we are thought leaders in AI ethics and AI Auditing, having published over 50 papers in this space. Through our applied research, we have developed our framework for audits, published an open-source library of metrics and mitigation techniques, and audited over 100 projects using our software platform.
To find out more about getting started on your AI auditing journey, Schedule a demo with one of our experts.
DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.
Schedule a call with one of our experts