In an era where machine learning models are making significant decisions that affect people's lives, ensuring fairness in these models is not just a priority, it's a necessity. Balancing fairness with accuracy, however, is challenging without the appropriate techniques in place, especially in regression models.
In a previous blog, we explored the fairness-accuracy trade-off for classification models, a topic that looks to analyse the balance between the accuracy obtained by these models while mitigation methods are applied in order to address potential biases.
Taking this analysis as inspiration, in this blog post we will apply a similar approach to observe the trade-off phenomenon but for regression models instead.
The main motivation of this analysis is to find the configurations for which the model achieves its best performance with and without the application of mitigators and then to analyse the results to answer the question about which algorithm to use in certain tasks, or what algorithms provide the more balanced trade-off between accuracy and fairness.
Before we start our analysis, letβs briefly recap our understanding of the fairness-accuracy trade-off. As explored in our previous blog, besides the optimal accuracy objective targeted by researchers over the years, mitigation methods have also been developed to mitigate possible bias within models. However, although the effectiveness of these techniques is to make the models fairer, most of the time this ethical improvement comes at the expense of a reduction in other metrics, such as accuracy.
Consequently, this problem posed the following question: how do we find the mitigator that provides the best balance between fairness and accuracy for a certain task?
As we presented, this gap was addressed by Haas through an interesting framework that helps to determine the more suitable model by applying optimisation techniques to find the best model configuration for the mitigators, before using a cost function that applies the metric values to determine the best set for the task.
Given the interesting results that we obtained from this approach in the classification tasks, we will expand its application for regression tasks.
Regression models are those intended for predicting continuous values, such as house or stock prices, for example. To illustrate the fairness-accuracy trade-off in regression models, we will consider the well-known βCommunities and Crimeβ dataset that contains socio-economic data to predict the crime rate in communities in the United States. Then, we will follow the guidelines presented in our previous blog to perform the trade-off analysis by training and optimising the models for the presented case study.
First, we will require the DEAP, Scikit-learn and the holisticai python packages for the multiobjective optimisation, training models and accuracy metric implementation, as well as the mitigators and fairness metrics implementation respectively.
Furthermore, we will follow the same pipeline:
Remember that you can find the complete implementation of this case study in the following link.
For our analysis, as explained, we will use the βCommunities and Crimeβ dataset from the UCI Machine Learning repository. This is a publicly available dataset which contains socio-economic and law enforcement data for 1994 communities in the United States. This dataset contains demographic variables such as population size, race, and education level, as well as variables related to law enforcement. The objective is to predict the crime rate per capita in each community. The protected attribute we will use in this analysis is the percentage of population that is Caucasian (βracePctWhiteβ)
After preprocessing the dataset with the protected group and selecting the training testing sets, we need to determine which metrics we will use to define the objective function in the optimisation process stage. Given that now we are working with regression models, a good option is to use an error measurement to assess the accuracy of the model. For our purpose, we will select the mean squared error (MSE) as the accuracy metric for this analysis.
On the other hand, in similar fashion to the previous blog, we will select the βmax absolute statistical parityβ for regression as the fairness metric, this function computes the maximum thresholds of the absolute statistical parity between the protected groups. The only consideration that we need to take into account is that this measurement is bounded only in 0, which is the desired value.
As expected, model selection varies according to the objective task. For simplicity since we are dealing with a regression problem, we will choose the Ridge regressor from the Scikit-learn package. This is a model that imposes a penalty on the coefficients by minimising a penalised residual sum of squares with a complexity parameter.
Moreover, to perform the presented trade-off analysis, we will implement two bias mitigation techniques, one for pre-processing and one for post-processing, besides the model without any kind of mitigation.
Given their fast-processing time and good results, we will implement the Correlation Remover and the Wassestein Barycenter methods.
Continuing with the trade-off analysis, the next step is to perform an optimisation process by solving multiobjective optimisation with an evolutionary technique such as Genetic Algorithms (GA).
In this case, our intention is to minimise the error of the model and minimise the max statistical parity, meaning we must deal with a minimisation problem.
To make the evolutionary process as simple as possible, we only will vary some of the hyperparameters of the regression model, which will be used as the chromosomes for the GA. These hyperparameters are the penalisation parameter, the number of iterations and the solver type. Furthermore, we will leave the bias mitigator parameters in their default values.
With all of these guidelines defined, we can run the optimisation process to determine the best candidate for the three variants with the DEAP package by evaluating the fitness function with the chromosomes through the process. We will run the process for 20 generations with a set of 100 individuals.
After completing it, we will repeat the process for the remaining models with mitigation.
Once we have performed the optimisation of the models, we can take the candidates and then select the model that provides us with the best trade-off between accuracy and fairness.
To do this, we will first plot the pareto frontier to observe how the best candidates for each approach are performing.
As we can observe in the previous graph, for this particular case, all methods perform differently from a practical perspective. While one of them presents better accuracy (model without mitigation), another method presents fairer results. This is an interesting result because it shows a negative correlation between these two metrics for this particular case.
Now, we will determine the best model by evaluating their metric results with the cost function proposed by Haas.
The following table summarizes the results after the application of the cost function for the different cases:
β
As we can see, the architecture that presents the best fairness-accuracy trade-off is the Ridge Regression with Wasserstein Barycenters mitigation method since it displays better results compared to the other tested architectures (notice that lower is better for this case). This conclusion is valid for all the scenarios of the cost function (equal weighting, more weight for accuracy, and more weight for fairness).
Through this tutorial, we have explained how to evaluate different approaches to determine the architecture that presents the best trade-off between accuracy and fairness for the regression case by following the framework proposed by Haas.
This framework allows the selection of different approaches and then evaluates them by defining a fitness function that contains an accuracy and a fairness metric, which is then optimised through an evolutionary algorithm (GA for our case).
The resulting candidates for best models of this optimisation are then evaluated with a cost function that combines both metrics (accuracy and fairness). It is here where the best model is finally determined.
Again, we suggest reading the original publication to find more details of the framework. In this tutorial, we have used the βmax statistical parityβ metric from the βholisticaiβ package, but feel free to experiment with the wide range of metrics that you can find in our open-source library β and the same stands for testing other accuracy metrics.
DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.
Schedule a call with one of our experts