Holistic AI Library Tutorial: Bias Measuring and Mitigation in Regression Tasks

Authored by

Researcher at Holistic AI

Published on

May 30, 2023

read time

min read

Bias-free regression analysis: a toolkit for detection and mitigation strategies

There are various techniques to measure bias in regression tasks. One common approach is to use fairness metrics such as demographic parity, equalized odds, or equal opportunity, which quantify the differences in the model's performance across different subgroups based on sensitive attributes such as gender or race.

Once bias is detected, we can employ different techniques to mitigate it. These methods can be grouped into three categories: Pre-processing, in-processing and post-processing methods. Pre-processing techniques are used to adjust the training data to remove bias, while in-processing methods are applied to build robust models against bias. Finally, post-processing techniques are used to adjust the model's predictions to remove bias.

In this article, we demonstrate tools which can be easily applied to measure and mitigate the presence of bias in regression models. This tutorial is based on the set of tutorials for regression task of the HolisticAI’s repository.

The tutorial will follow these stages:

Data loading and packages installation

Dataset preprocessing

Data analysis

Model training

Bias measuring

Bias mitigation

Results comparison

1. Data loading and packages installation

First, we need to install the required packages to perform our bias analysis and mitigation, in this case, we will use the holisticai package, this library can be installed by running the following command:


!pip install holisticai

Now we need to import the required packages:


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from urllib.request import urlopen
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

The dataset that we will use is the "Communities and Crime" dataset which is a publicly available dataset that contains socio-economic and law enforcement data for 1994 communities in the United States. The objective is to predict the crime rate per capita in each community with information that contains demographic variables such as population size, race, and education level, as well as variables related to law enforcement.

This dataset can be downloaded from its original repository or can be imported from the holisticai library:


from holisticai.datasets import load_us_crime
dataset = load_us_crime(return_X_y=False, as_frame=True)
df = pd.concat([dataset["data"], dataset["target"]], axis=1)
df.head()

A simple inspection shows us that there are some rows which present missing data. Therefore, we need to perform a data pre-processing step before feeding it into the machine learning model.

2. Dataset preprocessing

Next, we need to identify how much data is missing to take a decision on whether to remove it or fill it with some values. Given that we saw that there are several columns with missing data we can simply drop them as well as some of the instances.


df_clean = df.iloc[:,[i for i,n in enumerate(df.isna().sum(axis=0).T.values) if n<1000]]
df_clean = df_clean.dropna()
df_clean.head()

3. Data analysis

Once the dataset has been preprocessed, we can start defining the protected groups, to do that we need to perform some analysis of the race features to define the protected attribute.

Therefore, we will inspect which races we can find in the dataset:


cols = [c for c in df.columns if c.startswith('race')]
print(cols)
['racepctblack', 'racePctWhite', 'racePctAsian', 'racePctHisp']

As we can see, exist four categories in the races attribute that can be used for our purpose. For now, we will select the racepctblack column as the protected attribute to begin with the bias analysis. Now that we have defined our protected group, lets remove the unnecessary columns from the dataset.


cols = [c for c in df_clean.columns if (not c.startswith('race')) and (not c.startswith('age'))]
df_clean = df_clean[cols].iloc[:,3:]
df_clean.head()

Now, let's see which are the ten columns with the highest correlations with the objective column.

From this graph, we can see that interestingly the feature that is most closely related to the goal attribute is the "percentage of children born to never married individuals" (PctIlleg feature), then appear followed by other interesting features such as males and females who are divorced, people under the poverty level, and unemployed people, among others.

Now Next we will define our training and testing datasets:


X = df_clean.values[:,:-1]
y = df_clean.values[:,-1
]X_train,X_test,y_train,y_test, group_a_tr, group_a_ts, group_b_tr, group_b_ts = \
    train_test_split(X, y, group_a, group_b, test_size=0.2, random_state=42)
train_data = X_train, y_train, group_a_tr, group_b_tr
test_data  = X_test, y_test, group_a_ts, group_b_ts

4. Model training

With the training and testing sets at hand, we can train our ML model as usual, this model will serve as the baseline for later analysis and comparison. Since the protected groups were separated from the dataset previously, we do not need to take care of that in this opportunity at this time, but do not forget to separate them from the dataset, so that the model does not have any influence from these attributes in its training process. To do the training, we will use a traditional pipeline, we will fitting and re-scalinge the training data, and then we will use the data to train a "Linear regression" model, and once the model has been trained, we can use its predictions to calculate the fairness metrics of it.


X, y, group_a, group_b = train_data
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Train a simple linear regression model
LR = LinearRegression()
model = LR.fit(X, y)
X, y, group_a, group_b = test_data
X = scaler.transform(X)
# Predict values
y_pred = model.predict(X)
# Calculate the error
baseline_rmse = mean_squared_error(y, y_pred, squared=False)
print("RMS error: {}".format(baseline_rmse))

5. Bias measuring

The fairness of the model can be calculated using the predictions of the model and the protected groups defined previously. The holisticai library contains a module that calculates a set of metrics useful in evaluating the fairness of algorithmic decisions. For In our case, we will use the regression_bias_metrics function which allows us to select which metrics we want to calculate, if equal_outcome, equal_opportunity or both, where equal_outcome shows how disadvantaged groups are treated by the model and equal_opportunity shows if all the groups have the same opportunities.


from holisticai.bias.metrics import regression_bias_metrics
df = regression_bias_metrics(
    group_a,
    group_b,
    y_pred,    y,
    metric_type='both'
)
y_baseline = y_pred.copy()
df_baseline=df.copy()
df_baseline

All of these metrics indicate different aspects for the fairness of the ML models, for example:

Disparate Impact Q: Shows the ratio of success rates between the protected groups for a certain quantile. Values below 1 are unfair towards group_a. Values above 1 are unfair towards group_b. The range (0.8, 1.2) is considered acceptable.
Statistical parity: Computes the difference in success rates between the protected groups. Values below 0 are considered unfair towards group_a while values above 0 are considered unfair towards group_b.‍
Average score difference: Computes the difference in average scores between the protected groups. Negative values indicate that group_a has a lower average score, so bias against group_a, while positive values indicate group_b has a lower average score, so bias against group_b.
‍Z score difference: Computes the spread in Zscores between the protected groups, the Zscore is a normalised version of Disparate Impact.
‍Max Statistical Parity: Computes the maximum overall thresholds of the absolute statistical parity between the protected groups, values below 0.1 in absolute value are considered acceptable.
‍RMSE ratio: Computes the RMSE for the protected groups, lower values show bias against group_a while higher values show bias against group_b.
‍MAE ratio: Like the previous metric, computes the MAE for the protected groups, lower values show bias against group_a while higher values show bias against group_b.
‍Correlation difference: Computes the difference in correlation between predictions and targets for the protected groups, positive values show bias against group_a while negative values show bias against group_b.

6. Bias mitigation

From the previous chart, we can observe that the model metrics are far away from the desired values, therefore, we need to apply some kind of strategy to mitigate the bias present in the model.

There Exists exist several different kinds of strategies, and the literature has divided them into three categories: Pre-processing, in-processing and post-processing methods. The holisticai library allows you to implement different algorithms from these categories for bias mitigation, you can look at them in the documentation page. An interesting feature is that all of them are compatible with the Scikit-learn package, so that, if you are familiar with this package, you will not have problems using the library. In this opportunity, we will use the "pipeline" module to perform the model implementation.

In this opportunity example to perform the mitigation, we will implement the "Exponentiated gradient reduction" method which is an in-processing technique that reduces fair classification to a sequence of cost-sensitive classification problems and returns a randomized classifier with the lowest empirical error subject to fair classification constraints.


from holisticai.pipeline import Pipeline
from holisticai.bias.mitigation import ExponentiatedGradientReductio

inprocessing_model = ExponentiatedGradientReduction(constraints="BoundedGroupLoss",
                                         loss='Square', min_val=-0.1, max_val=1.3, upper_bound=0.001,
                                         ).transform_estimator(model)
 pipeline = Pipeline(
    steps=[
        ('scalar', StandardScaler()),
        ("bm_inprocessing", inprocessing_model),
    ]
)‍

X, y, group_a, group_b = train_data
fit_params = {
    "bm__group_a": group_a,
    "bm__group_b": group_b
}‍

pipeline.fit(X, y, **fit_params)‍

X, y, group_a, group_b = test_data
predict_params = {
    "bm__group_a": group_a,    "bm__group_b": group_b,
}
y_pred = pipeline.predict(X, **predict_params)
df_exp_grad_w_p = regression_bias_metrics(
    group_a,
    group_b,
    y_pred,
    y,
    metric_type='both'

7. Results comparison

Now that we could can observe how is the implementation of the mitigator in the model, we will compare the results between the baseline and the implementations with the mitigator to analyse how the metrics have changed.


result = pd.concat([df_baseline, df_exp_grad_w_p], axis=1).iloc[:, [0,2,1]]
result.columns = ['Baseline','Mitigator', 'Reference']
result

Certainly an improvement! From the previous chart, we can see that although the actual metrics are still far from the ideal values, we can obtain metrics closer to the ideal values if we apply a bias mitigator compared with our baseline.

Now that we could have seen how to perform bias mitigation for the Black race as the protected attribute, we can apply the same analysis to the remaining races.

For the White race attribute:

For the Asian race attribute:

For the Hispanic race attribute:

As we can see in the previous charts, the metrics will vary according to the protected attribute selection, this is expected since every protected attribute possesses its own distribution. On the other hand, the mitigators performance will also depend on the chosen hyperparameters, therefore we will have to vary them to obtain a fairer model. However, do not forget that you will face the accuracy and fairness trade-off.

Additionally, we can analyse how the metrics are affected by varying the hyperparameters of the method. For example, this method allows us to set different hyperparameters such as constraints, losses and so on; you will find more in its the documentation here.

For our analysis, we will reuse the training and testing data defined previously as well as the protected attribute, in this case "Black" race, and the hyperparameter that we will vary will be the loss function.

‍

As we can see in the above chart and graphs, the selection of different losses for this method will vary its performance, for example, we can observe that in general terms we achieve a better model with the “ZeroOne” loss, but the RMSE increases.

In general, the selection of the model parameters will depend on our main objective, whether we are looking for fairness or accuracy.

Summary

In this article, we have seen how to use the holisticai library to measure and mitigate the bias present in a regression task for the well-known Communities and Crime dataset. We were able to analyse the bias present in the model easily by using the regression_bias_metrics function and for the mitigation step we could implement a technique called "Exponentiated gradient" that could improve the fairness metrics with respect to the selected protected attribute.

In addition, we have shown how to implement a simple pipeline to perform all the steps to perform the complete the analysis varying the protected attributes of the model. If you want to follow the complete tutorial, you can do so here. Don’t forget to review the documentation page of the library if you want to try other bias mitigation methods.

DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.

Holistic AI OSL Library

Table of contents

Heading 2