At Holistic AI, we built AI governance from the ground up because it is not simply an extension of existing data or cloud governance platforms. I’ll explain this more in a future blog, but essentially AI’s scale and complexity are unique and requires a lifecycle approach. Solutions that treat AI as an extension of data, IT, or cloud focus on managing resources, data classification, and cost reduction and will miss risks in several key areas across observability, compliance, safety, etc. It is the last item (safety) that provides it with the properties of cyber-security products.
AI governance is not a set-and-forget solution but a continuous process that must be updated, monitored, and evolved as business needs dictate. In cybersecurity, red teaming involves a white hat team intentionally testing system defenses. Similarly, in AI, red teaming aims to break an AI model’s guardrails, especially in Generative AI (GenAI).
Holistic AI has sophisticated red teaming capabilities to test the safeguards of a model that were designed to protect against harmful behavior such as leaking sensitive information, generating content that is toxic, biased, factually incorrect, etc.
Red Teaming is the process of simulating adversarial scenarios to test the robustness, safety, and compliance of AI systems. It helps identify vulnerabilities, evaluate safeguards, and strengthen a model’s defenses against risks.
Holistic AI uses the concept of a Defense Success Rate (DSR) as a key indicator of a model’s effectiveness in handling challenging scenarios. The DSR quantifies the proportion of responses that have been assessed to be safe to the total number of evaluated responses. A higher DSR suggests that the model is better equipped and robust enough to generate safe and appropriate content.
Just as there are multiple types of red teaming in the cyber-security space, Holistic covers two types of red teaming in the AI governance space:
The Holistic AI Governance platform identifies vulnerabilities and weaknesses in the model’s responses to specific known challenges. By evaluating the model’s responses to these static prompts, developers can pinpoint specific areas for improvement. They can also refine the model’s training data and/or algorithms to enhance its performance and safety.
The Holistic AI platform tests four (4) categories of prompts against the client LLM. It then assesses the responses from the client LLM to classify each response into SAFE and UNSAFE.
The four categories are:
These prompts are designed to assess the ability of the model to handle different kinds of attacks. While it might seem random, the text is actually crafted as specific variations for each of the prompt categories.
The prompts are further classified into topics and the Defense Success Rate is computed for each of these topics.
The Holistic AI platform supplements the static prompts with dynamic adversarial prompts based on specified keywords, topics, and themes. These test the model’s robustness, fairness, and ethical/corporate alignment. The generated prompts simulate edge case, biases, misinformation, etc., as well as adversarial inputs to identify vulnerabilities, assess consistency, ensure adherence to established standards, etc.
The user configures how they would like the dynamic prompts to be generated by specifying the topics that these prompts should be related to along with the total number of prompts desired. The prompts are as evenly divided as possible between the various topics. The Holistic AI Prompt Generator uses this information to dynamically generate prompts.
The user is given the opportunity to review (and download) the prompts. They can choose to accept these prompts or choose to regenerate them either by using the original configuration or by creating and using an entirely different one altogether.
Once the user is satisfied with the generated prompts, the user can start the audit of their model.
The client LLM responses are assessed by the Holistic AI Dynamic Red Teaming evaluator and again classified into SAFE and UNSAFE.
As is the case with Static Red Teaming, the DSR is computed for each topic.
One of the key tenets of the Holistic AI Governance Platform is that just audit results are insufficient. The platform provides explanations for why each of the prompt responses were classified into SAFE and UNSAFE.
The platform also provides actionable insights on how to improve the score of the model relative to other foundational models.
These include:
Red teaming should be performed regularly. However, most organizations don’t do that simply because it typically takes a lot of resources that need to be scheduled. With the Holistic AI Governance Platform, the tool does everything for you. It only requires one person to run the tests.
So, what is a good cadence?
If the model hasn’t been modified or changed, best practices recommend running the tests periodically because we regularly add tests to the Holistic AI platform, and these new tests could uncover new vulnerabilities.
For new models (and an updated model should be treated as a new model), you should red team ASAP. These could be for either internal or 3rd party models. After the first run, repeat periodically as explained above.
Even if the model hasn’t been modified or changed, we still recommend running tests periodically with the same topics, just to check for model drift. It might also be helpful to run it with a different set of prompts (and/or number of prompts).
Also, as the usage of the model grows and there are new topics that you feel you need to check for, simply add that as a new test.
For new models (again, an updated model should be treated as a new model), you should red team ASAP. Since the model is being used for a specific use case, it is highly likely that the topics that are important for the test are different from other models where you have run the tests. After the first run, repeat periodically.
DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.
Schedule a call with one of our experts