The widespread use of machine learning (ML) models raises ethical questions about the ability to explain their behavior. Among the most relevant questions, two stand out: why does the model make this prediction? And how does the model make this prediction? In essence, these two questions are at the center to the field of explainable AI.
In this guide, we’ll step you through some core concepts related to explainability as well as hands-on steps you can deploy using Holistic AI’s open source Holistic AI Library, a toolkit for such tasks.
Interested in taking an even deeper dive into explainability metrics? Check out our white paper on ML prediction metrics for explainability.
It is important that users affected by and organizations using decisions generated through ML models can understand what prompted the model to make that decision. Imagine a bank customer trying to secure a loan, and your request is denied. When inquiring further from the bank, the bank explains that they have adopted a machine learning system that generates a credit score based on historical data and individual customer characteristics. In this scenario, it is entirely reasonable to request a more detailed explanation of the model’s operation.
We understand “explanation” as a visual or textual representation that provides a qualitative understanding of the relationship between the instance’s components and the model’s predictions.
This sense of “explanation” is illustrated below. We can observe that explanations help the agent (or user, developer…) interpret the model’s results and validate (or reject) the criteria used for prediction.
The output explanation framework for a model helps teams to consider not only accuracy, but also the features that contribute to the decision-making process. This is crucial because models with unexpected features of high importance might be selected over models with consistent importance but lower accuracy. Analyzing feature attributions is important to avoid making such unfair decisions.
The local explainability methods aim to explain the local effects of features on the outcome of a specific prediction, that is, an instance. An instance is defined as a row or a data point that we want to interpret. To achieve this interpretation, one of the most used strategies in the literature is LIME: Local Interpretable Model-agnostic Explanations.
LIME's main goal is to propose an explanation method to be applied in any classifier or regression model. The explanation produced by LIME can be represented as an additive feature attribution method. In general, the model seeks to explain a particular instance by minimizing a “fidelity” function while maintaining the model’s complexity at levels that are interpretable by humans.
Holistic AI has released an open-source library of tools for ensuring explainability across machine learning lifecycles. Below we’ll step through how you might explore concepts above using the Holistic AI Library.
The first step in implementing explainability metrics is to load the libraries and read the data. Remember that in this case, the library is installed as follows: pip install holisticai[explainability].
After installing the HAI library we need to load the dataset and split the data into train and test sets. In this tutorial, we use the Law School dataset. The goal of this dataset is the prediction of the binary attribute ‘bar’ (whether a student passes the law school bar). The protected attributes are race and gender. (you can visualize a bias measure implementation on this dataset in this link).
Now a machine learning model can be used to classify the students. In this case, we used Logistic Regression.
Explainer class
The Explainer class is used to compute metrics and generate graphs related to these metrics. There are a few parameters that are important for a successful implementation.
The “based-on” parameter defines the type of strategy that will be used — in this case, we use strategies based on feature importance.
The “strategy_type” parameter is used to select the strategy type, namely LIME.
Additionally, we need to define the model type (binary_classification), the model object, features used in training (X_train) and targets used in training (y_train).
After instantiating the explainability object for the model results, we can compute the metrics. With the HAI library, this process is simplified through the metrics function. In this example, we use the parameter detailed=True to visualize the results for labels 0 and 1.
Another important tool that is possible to access using the Explainer object is plots. The following code snippets show the bar plot example with the feature importance ranking and the box plot for data stability and feature stability.
In this tutorial, we present a feature of the explainability module of the Holistic AI library: the LIME feature importance. We compute explainability metrics and generate some plot options that can increase the explainability power on the model’s predictions. This shows some ways in which all teams can begin to implement these strategies for different datasets and contexts.
Curious about what 360 AI safety means for your organization? Request a demo to explore our AI risk, governance, and safety platform.
DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.
Schedule a call with one of our experts