Blog Post

Written by
Holistic AI
Structured content
June 6th, 2022
Bias ex machina: NYC mandates AI audits
Two road signs pointing in opposite directions titled 'BIAS' and 'FAIRNESS'

The problem of bias in AI

In 2014, Amazon began developing an AI tool that would sort through the resumes of job candidates and rate each candidate on a 5-star scale to determine who was suitable for employment. However, Amazon ultimately mothballed the tool upon discovering that it would penalize candidates with the word ‘women’ in their resume (as in, ‘captain of the women’s football team’).Although it was designed to be agnostic to the identity of the candidate (their race, gender, disability status, etc.), the AI had been trained to recognize and replicate patterns in Amazon’s historical recruitment data—a data set that, as with much of the tech sector, skewed towards men.

Despite its capacity for impartiality, an AI is only as fair as the data it is fed: for Amazon, it seems to have been a case of ‘garbage in, garbage out’, as the AI reproduced the unfair hiring patterns that permeate the workforce and that get reflected in the data. Given the ubiquity of inequality and discrimination, this problem is not unique to Amazon. Academic research has revealed racial and gender biases in a range of AI tools including facial recognition and natural language processors. For example, a 2018 study revealed that three popular facial recognition systems misclassified up to 34.7% of dark-skinned females as opposed to 0.8% of lighter-skinned males—likely because the systems were trained using images in which lighter-skinned men were overrepresented.

The Amazon case portends a wider problem: 55% of US human resources managers will integrate AI tools into their recruitment process in the next 5 years.AI-driven employment tools—including video interviews, game- and image-based assessments, and resume screening tools—are becoming industry standards. Left unchecked, these AI tools have the potential to reproduce the unfair hiring patterns that permeate most of the workforce and that get reflected in recruitment data.

Bias mandate

To address this rising concern, New York City Council has passed a new law that will mandate bias audits for automated employment decision tools. A bias audit is an impartial evaluation conducted by an independent auditor to determine whether an algorithmic system discriminates against individuals on the basis of protected characteristics (gender, race, etc.). Audits will now become mandatory for all firms (with more than 100 employees) that use AI-driven tools to assist in the evaluation of candidates or employees residing in New York City. Moreover, the legislation imposes a transparency obligation on firms who must disclose that they will use an automated decision tool to evaluate candidates and employees 10 working days before they do so.

The legislation is a ground-breaking intervention, but it will not be singular for long: regulatory action under consideration in the UK, European Union and US senate will require the owners of AI tools to audit their tools to prevent and mitigate numerous risks, including bias and discrimination. As legislatures begin to map out the regulatory framework for AI, they are increasingly looking to auditing as a means towards creating ‘ecosystems of trust’.

Despite New York City leading these efforts, there are some shortcomings with the legislation that could limit its impact. These limitations are concerned with a lack of clarity around who is qualified to conduct audits, the impact that the required notice period could have on hiring timelines and uncertainty about compliance from employers operating in New York City making employment decisions about candidates or employees residing outside of the City. Moreover, the legislation raises broader interpretive challenges insofar as it is premised on ever-changing notions of ‘identity’ and ‘harm’. We discuss these points in further detail here.

Impartial evaluations by independent auditors – in defining bias audits, the legislation specifies that these should be an independent evaluation by independent auditors, which we read as the stipulation that audits should follow a reasonable and standardised methodology. However, as we discuss below, there are multiple approaches to measuring bias and there is a lack of codified standards for conducting audits and the legislation does not specify whether auditors need to be accredited in some way. This contrasts with financial auditing, where auditors must have appropriate certification and follow codified standards when carrying out audits.

Jurisdiction outside of city limits – the legislation specifies that employers using automated employment decision tools to evaluate candidates or employees residing in the city must comply but does not make it clear whether this also applies to New York City based firms who are hiring candidates outside of the city limits. For example, would this legislation apply when workers are remote, but headquarters are registered as being in New York City? What if a candidate or employee moves into the City less than 10 days before the tool is scheduled to be used?

Impact on hiring timelines – although providing candidates or employees notice of the intended use of automated employment decision tools 10 business days prior to their use gives them time to adequately consider how the use of such tool may impact them, having to wait two weeks before using a tool might disrupt and delay hiring timelines. This may be problematic both for candidates who have their application process dragged out, and for employers who quickly want to fill a position.

Measuring and mitigating bias

The bias mandate stipulates that the audit must include—but not be limited to—an assessment of the disparate impact of the tool. Even if a process or policy is formally neutral to protected characteristics, US labor law assumes that it is discriminatory if it produces significantly disparate rates of selection between social groups for hiring, promotion, and other employment decisions. If the rate of selection of candidates from a minority group is less than 80% of the equivalent rate for the majority group, there is a presumption of discrimination, unless the firm can establish that the differential results are necessary for their legitimate business interests (e.g., a hiring process for lumberjacks that hires more males because physical strength is a prerequisite for the job and males are typically stronger than females).

If a decision tool does not satisfy this standard without legitimate business justification, this does not mean that it must be scrapped: mitigation strategies can diminish the bias in an AI tool. Mitigation can occur at the following stages of an AI tool’s lifecycle:

  1. Pre-processing. Before an AI model is trained, we can reduce and remove bias in the training data set through resampling, reweighting tuples in the data, and removing features from the data set.
  2. In-processing. Since AI models are trained to optimize along specified vectors, the model itself can be modified to optimize more for fairness.
  3. Post-processing. The output of a biased AI tool can be adjusted to achieve fairness by reprocessing the output data with a model that optimizes fairness.

By modifying the data that we feed AI models, the architecture of models, and how we use models’ outputs, we can mitigate the risk of bias and discrimination in AI decision tools. This task requires domain-specific experts who will be able to benchmark the process and data against relevant comparators in the field.

However, this field must refine the tools at its disposal if it is to produce genuinely fair and reliable AI.Research has demonstrated that constraining a model to increase fairness reduces its overall accuracy, particularly where a model is designed to meet more than one ‘bias objective’ (i.e., different mathematical constraints that reflect distinct conceptions of ‘fairness’ and ‘discrimination’). Moreover, research has also shown that eliminating disparities becomes increasingly difficult as one tries to equalize across a greater number of protected characteristics: in other words, whilst we might easily correct for unfairness along a single dimension (e.g., race), we might have much greater difficulty correcting for intersectional unfairness (e.g., the unfairness experienced by Black women in particular).

Bias auditing in practice

At HolisticAI, we have conducted bias audits for clients from a range of industries, sizes, and jurisdictions. Despite the nascence of the field, we have noted important trends:

  1. Defaulting to humans. In a preponderance of cases of bias, the most effective mitigation strategy is to insert human oversight into the process. This ‘human-in-the-loop’ strategy mitigates potential AI bias, but–of course–reintroduces the possibility of human bias.
  2. The intersectional challenge. Although AI systems can be adjusted for fairness along single identity dimensions (e.g., race), it is a far starker challenge to prevent intersectional discrimination–the forms of disadvantage and bias that accrue around composite identities (e.g., bias against Black women under 40).
  3. Privacy and Fairness. Whilst the imperative of privacy is to protect personal information (particularly from automated systems), this presents an impediment to fairness auditing, since it is exactly individuals’ personal information that we need to determine whether they are suffering identity-based discrimination. There is thus a tradeoff to be made between these imperatives.

Conclusion: an ecosystem of trust

The New York Bias Mandate portends the emergence of a wider regulatory ecosystem for artificial intelligence in the United States. This regulatory creation mirrors similar interventions in the EU AI Act, UK AI Strategy, and the US’s proposed federal Algorithmic Accountability Act, where regulators are increasingly looking to place auditing and assurance at the centre of their AI strategy. As in the New York case, these interventions will require any commercial application of AI to undergo independent assessment and verification of its risks. However, the remit in these cases is wider: rather than testing AI for fairness, these interventions propose assessing AI across several risk vectors, including transparency, fairness, reliability, and privacy.

The ambition, in each case, is that auditing and assurance will allow us to unlock the potential of artificial intelligence, whilst imbuing the technology with human-centric values that preserve the rights and dignity of individual users. Despite the shortcomings we point to above, the NYC bias mandate is a commendable move in this direction.

Interested in our company?