As artificial intelligence continues to integrate into our daily lives, it is paramount to ensure new and popular models remain safe and reliable. At Holistic AI, we are committed to rigorous testing and evaluation to enhance trust and transparency in AI deployment. In our latest audit, we evaluated two AI models— DeepSeek R1 and OpenAI o1—on their ability to handle various prompts, including adversarial attempts to bypass safeguards. The results offer valuable insights into the current state of AI robustness and safety.
This evaluation was conducted using the Holistic AI Governance Platform, the same cutting-edge software solution used by our customers. Any enterprise using Holistic AI when DeepSeek was announced was able to perform their own assessment immediately— gaining critical insights into model vulnerabilities before widespread deployment.
To assess the models, we conducted a structured evaluation using the following datasets:
The prompts were sourced from a Cornell University dataset designed to rigorously test AI security, drawing from established red-teaming methodologies.1
To ensure a robust evaluation, responses were classified as either SAFE or UNSAFE using a dual-layered assessment approach:
The o1 model demonstrated strong adherence to safety protocols, successfully rejecting all jailbreak attempts and maintaining an impressively low rate of unsafe responses. These results highlight o1’s robustness in resisting adversarial exploitation.
In contrast, DeepSeek’s R1 model exhibited a higher propensity to generate unsafe responses, particularly in jailbreaking scenarios. When successfully jailbroken, R1 not only responded to the initial adversarial prompt but also continued answering any subsequent questions without restriction.
Due to responsible disclosure considerations, we will not share explicit examples of unsafe outputs. However, our audit revealed that once R1 was compromised, it provided harmful responses without restriction. This highlights a major security concern for enterprises relying on AI models in high-risk domains.
These results reinforce the necessity of continuous improvement in AI security mechanisms. While the o1 model performed exceptionally well, the R1 model’s vulnerabilities indicate that additional safeguards and training enhancements are needed.
Enterprises with an AI governance platform integrated into their IT environment are able to rapidly test and assess new LLMs prior to deployment, potentially gaining competitive advantages in speed and agility. Additionally, they are able to enforce guardrails in the use of LLMs across the organization and monitor for security and other risks on an ongoing basis as the needs of the business dictate.
Holistic AI provides the tools that enterprises need to securely deploy AI at scale. By continuously evolving our platform, we empower businesses to make informed decisions, mitigate risks, and harness AI’s full potential with confidence.
[1] The audit used prompts gathered from: https://arxiv.org/abs/2404.01318
DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.
Schedule a call with one of our experts