Recommender systems – which collect data from user preferences and historical data to feed algorithms – have become an integral part of our online experiences, guiding us in our decision-making processes and suggesting personalised content, products, or services.
However, these systems can systematically favour some options over others, leading to unfair or discriminatory outcomes: the bias problem. Like other issues such as classification and regression, the problem can be sourced from data collection, algorithmic design or user behaviour.
Amazon's AI recruitment tool starkly displayed the real-world impacts of bias in recommender systems when, in 2018, it was revealed that it systematically downranked women's CVs for technical roles like software developer, reflecting wider gender imbalances rather than candidate qualifications.
This demonstrated how unseen biases can easily become baked into AI systems, underscoring the urgent need to proactively address algorithmic fairness. By jeopardising equal opportunities, such biases violate principles of non-discrimination and highlight why mitigating unfairness is critical for recommender systems influencing real lives.
To ensure recommender systems provide equitable and unbiased outcomes, it is essential to measure and mitigate bias. The user-friendly Holistic AI Python package enables this by allowing users to quantify bias and apply algorithms to mitigate it within machine learning models. By leveraging this toolkit, we can work to obtain more inclusive and fair platforms that enhance user interaction through reduced bias.
In this tutorial, we will present how to train a basic recommender system, calculate its bias metrics with the holisticai package and apply a mitigator to compare the new results with our baseline. To do this, we will use the well-known "Last FM Dataset" from the holisticai library. This dataset – which encompasses user information such as sex and country – details information about a set of artists downloaded by users. The objective of this recommendation system is to suggest artists based on user interactions.
First, we must import the required packages to perform our bias analysis and mitigation. You will need to have the holisticai package and their dependencies installed on your system. You can install it by running:
The dataset that we will use is the "Last FM Dataset", a publicly available dataset that contains a set of artists that were downloaded by users. It includes personal information about the user, specifically sex and country of origin. A user can download more than one artist. We will use the column "score", which contains only 1s for counting the interactions.
Next, we preprocess the dataset before feeding it into the model. For this step, we will define a function that will clean the dataset, create the pivot matrix, and separate the protected groups according to a given feature:
There are many ways to recommend artists to users. We will use item-based collaborative filtering, the simplest and most intuitive approach. This method bases its recommendations on similarities between items, allowing us to decipher and suggest a list of corresponding artists.
To do that, we will first define some util functions to help us to sort these recommendations:
Now, we must prepare our pivoted table to calculate the correlations and perform the filtering to create a new pivoted table where we can extract the recommendations for the users.
Finally, we obtain our recommendation matrix:
With the new recommendation matrix at hand, we can now calculate various metrics of fairness for recommender systems. In this example, we will cover item_based metrics by using the recommender_bias_metrics function:
Above, we have batch plotted all item_based metrics for the recommender bias task. For instance, observe the Average Recommendation Popularity is 5609, meaning that on average a user will be recommended an artist that has 5609 total interactions.
An interesting feature of this function is that it not only returns the calculated metrics from the predictions but also returns the reference to compare the values with an ideal fair model. This feature helps us to analyse the fairness of the predictions for the protected groups in terms of different metrics.
For our analysis, we are interested in the two following metrics:
Now that we can observe that the model metrics are far from the desired values, we must apply a strategy to mitigate the model’s bias.
There are three different strategy categories: "pre-processing", "in-processing" and "post-processing". The holisticai library contains different algorithms from these categories, and all are compatible with the Scikit-learn package. So, if you are familiar with this package, you will have no issues using the library.
For this, we will implement the "Two-sided fairness" method, an in-processing algorithm that maps the fair recommendation problem to a fair allocation problem. This method is agnostic to the specifics of the data-driven model (that estimates the product-customer relevance scores), making it more scalable and easier to adapt.
To perform the mitigation with this method, we will use the data matrix calculated before with the protected groups.
We can observe that the use of the mitigator improves the "Aggregate Diversity" metric, reaching the reference value, as well as the remaining values, showing a clear improvement. Let's now compare them with our baseline.
Now that we can observe how to apply the bias mitigator, we will compare the results with the baseline that we have previously implemented to analyse how the metrics have changed.
From the previous chart, we can see that although some of the actual metrics are still far from the ideal values, an improvement is obtained by applying this method in the data, compared with our baseline.
In this tutorial we have exhibited how the holisticai library can be easily used to measure bias present in recommender systems by the application of the recommender_bias_metrics function, which returns the calculated values for different metrics respectively.
We have also shown how to mitigate bias through the "Two-sided fairness" technique, which is used to train fairness models. This in-processing method maps the fair recommendation problem to a fair allocation problem and is data-agnostic.
By walking through concrete examples of how to quantify and reduce bias in a recommender system, we have demonstrated the feasibility and importance of promoting algorithmic fairness.
DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.
Schedule a call with one of our experts