In machine learning, explainability metrics are crucial because they provide insight into how a model makes decisions, fostering transparency, trust, and accountability in AI systems.
One of the easiest ways to compute explainability metrics for machine learning models is through the Holistic AI Library, an open-source tool designed to assess and improve the trustworthiness of AI systems.
In this blog post, we will introduce the concept of permutation feature importance and explain how to build more transparent machine learning models with it. Additionally, we will showcase a practical application of explainability metrics.
Permutation feature importance is a valuable tool in the realm of machine learning explainability. Unlike model-specific methods, it is model-agnostic, meaning it can be applied to various types of predictive algorithms, such as linear regression, random forests, support vector machines, and neural networks. This universality makes it particularly useful when dealing with a diverse range of models, providing a platform to understand their inner workings.
The process of calculating permutation feature importance involves systematically shuffling the values of a single feature while keeping the other set of features unchanged. By doing so and reevaluating the model’s performance, we can observe how the shuffling impacts the predictive accuracy or performance metric of the model.
A feature that significantly affects the model’s performance will demonstrate a considerable drop in predictive accuracy when its values are permuted, highlighting its importance.
In this application, we will train two classification models (Random Forest and AdaBoost) as applied to the commonly used Adult dataset. The problem to be solved by this dataset is as follows: given a set of financial and personal characteristics about a specific individual, how should we classify them in relation to their income – do they earn more than 60K per year or less than 50K per year?
The first step is simplified data preprocessing. The main change in this dataset will be to perform a transformation on the categorical variables using the `pd.get_dummies` function.
Next, we will use the Explainer case to compute the permutation feature importance of the model. After instantiating the Explainer case, we can calculate the metrics and generate the result plots.
We can define strategy_type as permutation, surrogate, lime, shap or shap-text. For the following example, we are using permutation.
A quick way to compute explainability metrics using the Holistic AI Library is by calling the metrics function from the explainer object. This way, feature importance metrics are computed quickly and conveniently.
A partial dependence plot is a useful tool for understanding how an independent variable (or a set of variables) affects the prediction of a machine learning model while keeping other variables constant.
And how do we interpret the curve? If the curve goes up or down, it indicates the direction of the effect of the independent variable on the model’s response. If the curve is flat, the independent variable has relatively little impact on the model’s output. The slope of the curve shows the size of the effect – a steep slope indicates a strong effect, while a gentle slope indicates a weaker effect.
In this case, we can show that education-num and marital-status_Married-civ-spouse have a positive impact on predictions. Otherwise, capital-gain has a high impact on the model’s output for a certain period, after it becomes flat.
In this article, we highlighted a key feature of the Holistic AI library's explainability module: permutation feature importance. By computing specific explainability metrics and generating partial dependence plots for various features, we offered a method to enhance transparency in intelligent systems across diverse datasets and contexts.
DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.
Schedule a call with one of our experts