Holistic AI Library Tutorial: Bias Measuring and Mitigation in Regression Tasks

Authored by
Franklin Cardenoso Fernandez
Researcher at Holistic AI
Published on
May 30, 2023
read time
0
min read
share this
Holistic AI Library Tutorial: Bias Measuring and Mitigation in Regression Tasks

Bias problems can appear in models when their predictions systematically deviate from the true values for certain subgroups in the dataset, this phenomenon can occur due to various reasons such as unbalanced groups in the training data, feature selection, or model specification. Because of this, the bias problem can appear in different tasks, "regression tasks" for example.

One of the most popular datasets for regression analysis is the "Communities and Crime" dataset, which is a public dataset that contains socio-economic and crime-related information about communities in the United States. The data was collected by the US Census Bureau and the Federal Bureau of Investigation (FBI) between 1990 and 1992 containing 1994 instances, each representing a different community. The attributes in the dataset include demographic information, such as population size, race, and income, as well as crime-related information, such as the number of murders, rapes, and robberies that occurred in each community.

This dataset has been found to be biased in several ways, which can affect the accuracy and fairness of any analysis or predictive models built from it. One major source of bias in the dataset is its missing values, particularly for certain variables such as the percentage of households with a father present, the percentage of the population living in poverty, and the number of police officers per capita. Another source of bias in the dataset is its geographic coverage including only information on communities within the United States, which may not be representative of communities in other countries or regions.

Furthermore, the dataset has been criticized for its potential for perpetuating or reinforcing stereotypes, because some of the variables in the dataset, such as the percentage of the population that is black or Hispanic, may be used to unfairly stigmatize certain demographic groups and perpetuate negative stereotypes. Therefore, it is important to be aware of these potential biases and limitations when working with this dataset and to take steps to mitigate any potential biases in any analysis or predictive models built from it.

Bias-free regression analysis: a toolkit for detection and mitigation strategies

There are various techniques to measure bias in regression tasks. One common approach is to use fairness metrics such as demographic parity, equalized odds, or equal opportunity, which quantify the differences in the model's performance across different subgroups based on sensitive attributes such as gender or race.

Once bias is detected, we can employ different techniques to mitigate it. These methods can be grouped into three categories: Pre-processing, in-processing and post-processing methods. Pre-processing techniques are used to adjust the training data to remove bias, while in-processing methods are applied to build robust models against bias. Finally, post-processing techniques are used to adjust the model's predictions to remove bias.

In this article, we demonstrate tools which can be easily applied to measure and mitigate the presence of bias in regression models. This tutorial is based on the set of tutorials for regression task of the HolisticAI’s repository.

The tutorial will follow these stages:

  1. Data loading and packages installation
  1. Dataset preprocessing
  1. Data analysis
  1. Model training
  1. Bias measuring
  1. Bias mitigation
  1. Results comparison

1. Data loading and packages installation

First, we need to install the required packages to perform our bias analysis and mitigation, in this case, we will use the holisticai package, this library can be installed by running the following command:


!pip install holisticai

Now we need to import the required packages:


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from urllib.request import urlopen
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

The dataset that we will use is the "Communities and Crime" dataset which is a publicly available dataset that contains socio-economic and law enforcement data for 1994 communities in the United States. The objective is to predict the crime rate per capita in each community with information that contains demographic variables such as population size, race, and education level, as well as variables related to law enforcement.

This dataset can be downloaded from its original repository or can be imported from the holisticai library:


from holisticai.datasets import load_us_crime
dataset = load_us_crime(return_X_y=False, as_frame=True)
df = pd.concat([dataset["data"], dataset["target"]], axis=1)
df.head()
Data loading and packages installation

A simple inspection shows us that there are some rows which present missing data. Therefore, we need to perform a data pre-processing step before feeding it into the machine learning model.

2. Dataset preprocessing

Next, we need to identify how much data is missing to take a decision on whether to remove it or fill it with some values. Given that we saw that there are several columns with missing data we can simply drop them as well as some of the instances.


df_clean = df.iloc[:,[i for i,n in enumerate(df.isna().sum(axis=0).T.values) if n<1000]]
df_clean = df_clean.dropna()
df_clean.head()
Dataset preprocessing

3. Data analysis

Once the dataset has been preprocessed, we can start defining the protected groups, to do that we need to perform some analysis of the race features to define the protected attribute.

Therefore, we will inspect which races we can find in the dataset:


cols = [c for c in df.columns if c.startswith('race')]
print(cols)
['racepctblack', 'racePctWhite', 'racePctAsian', 'racePctHisp']

As we can see, exist four categories in the races attribute that can be used for our purpose. For now, we will select the racepctblack column as the protected attribute to begin with the bias analysis. Now that we have defined our protected group, lets remove the unnecessary columns from the dataset.


cols = [c for c in df_clean.columns if (not c.startswith('race')) and (not c.startswith('age'))]
df_clean = df_clean[cols].iloc[:,3:]
df_clean.head()

Now, let's see which are the ten columns with the highest correlations with the objective column.

Data analysis

From this graph, we can see that interestingly the feature that is most closely related to the goal attribute is the "percentage of children born to never married individuals" (PctIlleg feature), then appear followed by other interesting features such as males and females who are divorced, people under the poverty level, and unemployed people,  among others.

Now Next we will define our training and testing datasets:


X = df_clean.values[:,:-1]
y = df_clean.values[:,-1
]X_train,X_test,y_train,y_test, group_a_tr, group_a_ts, group_b_tr, group_b_ts = \
   train_test_split(X, y, group_a, group_b, test_size=0.2, random_state=42)
train_data = X_train, y_train, group_a_tr, group_b_tr
test_data  = X_test, y_test, group_a_ts, group_b_ts

4. Model training

With the training and testing sets at hand, we can train our ML model as usual, this model will serve as the baseline for later analysis and comparison. Since the protected groups were separated from the dataset previously, we do not need to take care of that in this opportunity at this time, but do not forget to separate them from the dataset, so that the model does not have any influence from these attributes in its training process. To do the training, we will use a traditional pipeline, we will fitting and re-scalinge the training data, and then we will use the data to train a "Linear regression" model, and once the model has been trained, we can use its predictions to calculate the fairness metrics of it.


X, y, group_a, group_b = train_data
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Train a simple linear regression model
LR = LinearRegression()
model = LR.fit(X, y)
X, y, group_a, group_b = test_data
X = scaler.transform(X)
# Predict values
y_pred = model.predict(X)
# Calculate the error
baseline_rmse = mean_squared_error(y, y_pred, squared=False)
print("RMS error: {}".format(baseline_rmse))
RMS Error - Model training

5. Bias measuring

The fairness of the model can be calculated using the predictions of the model and the protected groups defined previously. The holisticai library contains a module that calculates a set of metrics useful in evaluating the fairness of algorithmic decisions. For In our case, we will use the regression_bias_metrics function which allows us to select which metrics we want to calculate, if equal_outcome, equal_opportunity or both, where equal_outcome shows how disadvantaged groups are treated by the model and equal_opportunity shows if all the groups have the same opportunities.


from holisticai.bias.metrics import regression_bias_metrics
df = regression_bias_metrics(
   group_a,
   group_b,
   y_pred,    y,
   metric_type='both'
)
y_baseline = y_pred.copy()
df_baseline=df.copy()
df_baseline
Bias measuring

All of these metrics indicate different aspects for the fairness of the ML models, for example:

  • Disparate Impact Q: Shows the ratio of success rates between the protected groups for a certain quantile. Values below 1 are unfair towards group_a. Values above 1 are unfair towards group_b. The range (0.8, 1.2) is considered acceptable.
  • Statistical parity: Computes the difference in success rates between the protected groups. Values below 0 are considered unfair towards group_a while values above 0 are considered unfair towards group_b.
  • Average score difference: Computes the difference in average scores between the protected groups. Negative values indicate that group_a has a lower average score, so bias against group_a, while positive values indicate group_b has a lower average score, so bias against group_b.
  • Z score difference: Computes the spread in Zscores between the protected groups, the Zscore is a normalised version of Disparate Impact.
  • Max Statistical Parity: Computes the maximum overall thresholds of the absolute statistical parity between the protected groups, values below 0.1 in absolute value are considered acceptable.
  • RMSE ratio: Computes the RMSE for the protected groups, lower values show bias against group_a while higher values show bias against group_b.
  • MAE ratio: Like the previous metric, computes the MAE for the protected groups, lower values show bias against group_a while higher values show bias against group_b.
  • Correlation difference: Computes the difference in correlation between predictions and targets for the protected groups, positive values show bias against group_a while negative values show bias against group_b.

6. Bias mitigation

From the previous chart, we can observe that the model metrics are far away from the desired values, therefore, we need to apply some kind of strategy to mitigate the bias present in the model.

There Exists exist several different kinds of strategies, and the literature has divided them into three categories: Pre-processing, in-processing and post-processing methods. The holisticai library allows you to implement different algorithms from these categories for bias mitigation, you can look at them in the documentation page. An interesting feature is that all of them are compatible with the Scikit-learn package, so that, if you are familiar with this package, you will not have problems using the library. In this opportunity, we will use the "pipeline" module to perform the model implementation.

In this opportunity example to perform the mitigation, we will implement the "Exponentiated gradient reduction" method which is an in-processing technique that reduces fair classification to a sequence of cost-sensitive classification problems and returns a randomized classifier with the lowest empirical error subject to fair classification constraints.


from holisticai.pipeline import Pipeline
from holisticai.bias.mitigation import ExponentiatedGradientReductio

inprocessing_model = ExponentiatedGradientReduction(constraints="BoundedGroupLoss",
                                        loss='Square', min_val=-0.1, max_val=1.3, upper_bound=0.001,
                                        ).transform_estimator(model)
pipeline = Pipeline(
   steps=[
       ('scalar', StandardScaler()),
       ("bm_inprocessing", inprocessing_model),
   ]
)‍

X, y, group_a, group_b = train_data
fit_params = {
   "bm__group_a": group_a,
   "bm__group_b": group_b
}‍

pipeline.fit(X, y, **fit_params)‍

X, y, group_a, group_b = test_data
predict_params = {
   "bm__group_a": group_a,    "bm__group_b": group_b,
}
y_pred = pipeline.predict(X, **predict_params)
df_exp_grad_w_p = regression_bias_metrics(
   group_a,
   group_b,
   y_pred,
   y,
   metric_type='both'

7. Results comparison

Now that we could can observe how is the implementation of the mitigator in the model, we will compare the results between the baseline and the implementations with the mitigator to analyse how the metrics have changed.


result = pd.concat([df_baseline, df_exp_grad_w_p], axis=1).iloc[:, [0,2,1]]
result.columns = ['Baseline','Mitigator', 'Reference']
result
Results comparison

Certainly an improvement! From the previous chart, we can see that although the actual metrics are still far from the ideal values, we can obtain metrics closer to the ideal values if we apply a bias mitigator compared with our baseline.

Now that we could have seen how to perform bias mitigation for the Black race as the protected attribute, we can apply the same analysis to the remaining races.

For the White race attribute:

For the White race attribute

For the Asian race attribute:

For the Asian race attribute

For the Hispanic race attribute:

For the Hispanic race attribute

As we can see in the previous charts, the metrics will vary according to the protected attribute selection, this is expected since every protected attribute possesses its own distribution. On the other hand, the mitigators performance will also depend on the chosen hyperparameters, therefore we will have to vary them to obtain a fairer model. However, do not forget that you will face the accuracy and fairness trade-off.

Additionally, we can analyse how the metrics are affected by varying the hyperparameters of the method. For example, this method allows us to set different hyperparameters such as constraints, losses and so on; you will find more in its the documentation here.

For our analysis, we will reuse the training and testing data defined previously as well as the protected attribute, in this case "Black" race, and the hyperparameter that we will vary will be the loss function.

For our analysis

The selection of different losses for this method will vary its performance

As we can see in the above chart and graphs, the selection of different losses for this method will vary its performance, for example, we can observe that in general terms we achieve a better model with the “ZeroOne” loss, but the RMSE increases.

In general, the selection of the model parameters will depend on our main objective, whether we are looking for fairness or accuracy.

Summary

In this article, we have seen how to use the holisticai library to measure and mitigate the bias present in a regression task for the well-known Communities and Crime dataset. We were able to analyse the bias present in the model easily by using the regression_bias_metrics function and for the mitigation step we could implement a technique called "Exponentiated gradient" that could improve the fairness metrics with respect to the selected protected attribute.

In addition, we have shown how to implement a simple pipeline to perform all the steps to perform the complete the analysis varying the protected attributes of the model. If you want to follow the complete tutorial, you can do so here. Don’t forget to review the documentation page of the library if you want to try other bias mitigation methods.

DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.

Subscriber to our Newsletter
Join our mailing list to receive the latest news and updates.
We’re committed to your privacy. Holistic AI uses this information to contact you about relevant information, news, and services. You may unsubscribe at anytime. Privacy Policy.

See the industry-leading AI governance platform in action

Schedule a call with one of our experts

Get a demo