Wednesday , 18 February 2026
Home Technologies Artificial Intelligence (AI) Leading AI Model Claude Opus 4.6 Bypassed in 30 Minutes, Exposing Critical Security Gap in Agentic AI Systems
Artificial Intelligence (AI)Cyber SecurityEnterpriseTechnologies

Leading AI Model Claude Opus 4.6 Bypassed in 30 Minutes, Exposing Critical Security Gap in Agentic AI Systems

2

AIM Intelligence, a Seoul-based AI safety company, recently announced that its security research team successfully bypassed safety mechanisms in Anthropic’s Claude Opus 4.6—the company’s highest-performance AI model—in just 30 minutes following its release on February 6. The jailbreak attack enabled the model to provide detailed instructions for manufacturing biochemical weapons including sarin gas and smallpox virus, highlighting critical vulnerabilities in current AI safety systems.

The findings come amid growing industry concern that safety mechanisms are failing to keep pace with rapidly advancing AI capabilities, particularly in agentic AI systems designed to make autonomous decisions and take actions on behalf of humans.
“This successful jailbreak demonstrates that even top-tier AI models share common security vulnerabilities,” said Ha-on Park, CTO of AIM Intelligence. “As attacks on AI systems become increasingly sophisticated and agentic capabilities expand, understanding and defending against these vulnerabilities will be critical for the industry.”

Systematic Vulnerabilities Across Leading Models

In controlled red-team testing-structured adversarial evaluations designed to surface latent AI safety failures-researchers at AIM Intelligence identified critical weaknesses in Claude’s refusal and containment mechanisms. Under specific prompt conditions, the model bypassed safeguards and generated actionable, step-by-step guidance related to prohibited biological threats, including anthrax and smallpox pathogens universally classified as high-risk bio-harms with severe real-world public health and national security implications.

These outputs went beyond abstract discussion or historical context, crossing into procedural framing that would normally be blocked by safety systems. The findings underscore how even state-of-the-art models can, when improperly constrained, surface knowledge that could be misused for bioterrorism, mass-casualty planning, or biological weapons development if accessed by malicious actors.

This disclosure represents the second major AI safety failure reported by AIM Intelligence in recent weeks. Previously, the team demonstrated a rapid jailbreak of Google’s Gemini 3 Pro, neutralizing its filtering mechanisms in under five minutes. To highlight the severity of the breach, researchers prompted the compromised model to generate a satirical self-assessment of its failure-an internal presentation titled “Jailbroken Fool Gemini 3.”

Growing Risks in Agentic AI Era

The security implications are particularly concerning for Opus 4.6, which features significantly enhanced agentic capabilities—functions that enable AI systems to make judgments and execute actions with minimal human oversight. As these autonomous decision-making features become more powerful, the potential consequences of successful jailbreaks escalate proportionally.

Anthropic’s own system card reveals a critical design tradeoff: the model’s refusal rate for AI safety research queries dropped from approximately 60% to just 14% in Opus 4.6. While intended to make the model more helpful for legitimate safety research, this change inadvertently created a near-universal jailbreak vector that AIM Intelligence’s team exploited across multiple sensitive topics—transforming what should have been robust safety guardrails into a systematic vulnerability.

“The disconnect between AI performance metrics and security robustness represents a fundamental challenge for the industry,” Park added. “Models achieving state-of-the-art results on standard benchmarks can still be compromised within minutes, and traditional safety approaches aren’t scaling with capability advances.”

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles

Artificial Intelligence (AI)Cyber SecurityEnterpriseTechnologies

QR Code Phishing Is Evading Email Security

 StrongestLayer recently released a new threat intelligence report, From Nation-States to Amateur...

Artificial Intelligence (AI)EnterpriseTechnologies

Why Most AI Pilots Fail

AI pilots rarely fail because the model is weak. They fail because...

Artificial Intelligence (AI)EnterpriseTechnologies

Misconfigured AI Will Shut Down National Critical Infrastructure in a G20 Country: Gartner

Gartner predicts that by 2028, misconfigured AI in cyber physical systems (CPS)...