Wednesday , 18 February 2026

Home Technologies Artificial Intelligence (AI) Leading AI Model Claude Opus 4.6 Bypassed in 30 Minutes, Exposing Critical Security Gap in Agentic AI Systems

Artificial Intelligence (AI)Cyber Security Enterprise Technologies

Leading AI Model Claude Opus 4.6 Bypassed in 30 Minutes, Exposing Critical Security Gap in Agentic AI Systems

DTC NewsFebruary 18, 20262

AIM Intelligence, a Seoul-based AI safety company, recently announced that its security research team successfully bypassed safety mechanisms in Anthropic’s Claude Opus 4.6—the company’s highest-performance AI model—in just 30 minutes following its release on February 6. The jailbreak attack enabled the model to provide detailed instructions for manufacturing biochemical weapons including sarin gas and smallpox virus, highlighting critical vulnerabilities in current AI safety systems.

The findings come amid growing industry concern that safety mechanisms are failing to keep pace with rapidly advancing AI capabilities, particularly in agentic AI systems designed to make autonomous decisions and take actions on behalf of humans.
“This successful jailbreak demonstrates that even top-tier AI models share common security vulnerabilities,” said Ha-on Park, CTO of AIM Intelligence. “As attacks on AI systems become increasingly sophisticated and agentic capabilities expand, understanding and defending against these vulnerabilities will be critical for the industry.”

Systematic Vulnerabilities Across Leading Models

In controlled red-team testing-structured adversarial evaluations designed to surface latent AI safety failures-researchers at AIM Intelligence identified critical weaknesses in Claude’s refusal and containment mechanisms. Under specific prompt conditions, the model bypassed safeguards and generated actionable, step-by-step guidance related to prohibited biological threats, including anthrax and smallpox pathogens universally classified as high-risk bio-harms with severe real-world public health and national security implications.

These outputs went beyond abstract discussion or historical context, crossing into procedural framing that would normally be blocked by safety systems. The findings underscore how even state-of-the-art models can, when improperly constrained, surface knowledge that could be misused for bioterrorism, mass-casualty planning, or biological weapons development if accessed by malicious actors.

This disclosure represents the second major AI safety failure reported by AIM Intelligence in recent weeks. Previously, the team demonstrated a rapid jailbreak of Google’s Gemini 3 Pro, neutralizing its filtering mechanisms in under five minutes. To highlight the severity of the breach, researchers prompted the compromised model to generate a satirical self-assessment of its failure-an internal presentation titled “Jailbroken Fool Gemini 3.”

Growing Risks in Agentic AI Era

The security implications are particularly concerning for Opus 4.6, which features significantly enhanced agentic capabilities—functions that enable AI systems to make judgments and execute actions with minimal human oversight. As these autonomous decision-making features become more powerful, the potential consequences of successful jailbreaks escalate proportionally.

Anthropic’s own system card reveals a critical design tradeoff: the model’s refusal rate for AI safety research queries dropped from approximately 60% to just 14% in Opus 4.6. While intended to make the model more helpful for legitimate safety research, this change inadvertently created a near-universal jailbreak vector that AIM Intelligence’s team exploited across multiple sensitive topics—transforming what should have been robust safety guardrails into a systematic vulnerability.

“The disconnect between AI performance metrics and security robustness represents a fundamental challenge for the industry,” Park added. “Models achieving state-of-the-art results on standard benchmarks can still be compromised within minutes, and traditional safety approaches aren’t scaling with capability advances.”

Previous post Why Most AI Pilots Fail

Next post QR Code Phishing Is Evading Email Security

QR Code Phishing Is Evading Email Security

StrongestLayer recently released a new threat intelligence report, From Nation-States to Amateur...

ByDTC NewsFebruary 18, 2026

Artificial Intelligence (AI)Enterprise Technologies

Why Most AI Pilots Fail

AI pilots rarely fail because the model is weak. They fail because...

ByDTC NewsFebruary 17, 2026

Artificial Intelligence (AI)Enterprise Technologies

CynLr Unveils Object Intelligence Platform to Make Robots Learn and Adapt Like a Human Baby

CynLr, a Bengaluru-based deep tech company, after five years of intense R&D,...

ByDTC NewsFebruary 17, 2026

Artificial Intelligence (AI)Enterprise Technologies

Misconfigured AI Will Shut Down National Critical Infrastructure in a G20 Country: Gartner

Gartner predicts that by 2028, misconfigured AI in cyber physical systems (CPS)...

ByDTC NewsFebruary 17, 2026

Recent Posts

Yotta Data Services Strengthens India’s Digital Sovereignty with the Inauguration of National Data Center in North-East

75% of Manufacturers Embed AI into Enterprise Strategy: Infosys

New BrowserStack Report Finds 94% of Teams Use AI in Testing, but Only 12% Have Reached Full Autonomy

Superhealth launches SuperOS, the world’s first agentic AI OS built to actually run a hospital

Leading AI Model Claude Opus 4.6 Bypassed in 30 Minutes, Exposing Critical Security Gap in Agentic AI Systems

Leave a comment

Leave a Reply Cancel reply

Recent Posts

QR Code Phishing Is Evading Email Security